SageMaker doesn’t support argparse actions

Hi,

I saw that there is a difference about how to get arguments between script in

but I need some explanation.

Script in “Prepare a :hugs: Transformers fine-tuning script”

SageMaker doesn’t support argparse actions: what does it means?

The `hyperparameters` defined in the [Hugging Face Estimator](https://huggingface.co/docs/sagemaker/train#create-an-huggingface-estimator)
are passed as named arguments and processed by `ArgumentParser()` .

import transformers
import datasets
import argparse
import os

if __name__ == "__main__":

    parser = argparse.ArgumentParser()

    # hyperparameters sent by the client are passed as command-line arguments to the script
    parser.add_argument("--epochs", type=int, default=3)
    parser.add_argument("--per_device_train_batch_size", type=int, default=32)
    parser.add_argument("--model_name_or_path", type=str)

Note that SageMaker doesn’t support argparse actions. 
For example, if you want to use a boolean hyperparameter, 
specify type as bool in your script and provide an explicit True or False value.

Script in transformers/examples/pytorch

For example, in the script run_ner.py, the formulation is different.

from transformers import (
    (...),
    HfArgumentParser,
    (...)
)

(...)

@dataclass
class ModelArguments:
    """
    Arguments pertaining to which model/config/tokenizer we are going to fine-tune from.
    """

    model_name_or_path: str = field(
        metadata={"help": "Path to pretrained model or model identifier from huggingface.co/models"}
    )
    config_name: Optional[str] = field(
        default=None, metadata={"help": "Pretrained config name or path if not the same as model_name"}
    )

  (...)

def main():
    # See all possible arguments in src/transformers/training_args.py
    # or by passing the --help flag to this script.
    # We now keep distinct sets of args, for a cleaner separation of concerns.

    parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))
    if len(sys.argv) == 2 and sys.argv[1].endswith(".json"):
        # If we pass only one argument to the script and it's the path to a json file,
        # let's parse it to get our arguments.
        model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
    else:
        model_args, data_args, training_args = parser.parse_args_into_dataclasses()

Someone could explain the differences and if in SageMaker, we must rephrase the arguments section of the scripts in transformers/examples/pytorch/ as formulated in Prepare a :hugs: Transformers fine-tuning script or not? Thanks.

This means you cannot use parser.add_argument("--args", action="store_true")


The HfArgumentParser is a custom on top implementation on the argsparser to make it easy create python scripts for Transformers. You can use the HfArgumentParser if you want and feel confident, for example with the HfArgumentParser you don’t need to define the TrainingArguments as parser.add_argument since they are added behind the scenes.
For the SageMaker examples we went with the default argsparser since it is easier and faster to get started for non Transformers experts and it might have been difficult to understand that you don’t need to define per_device_train_batch_size in the train.py but can use it as hyperparameter in the notebook.

Someone could explain the differences and if in SageMaker, we must rephrase the arguments section of the scripts in transformers/examples/pytorch/ as formulated in Prepare a :hugs: Transformers fine-tuning script or not?

No you don’t need to rephrase them, since the HfArgumentParser is creating add_argument behind the scenes it works with SageMaker. So you can decide how you would like to structure your script

1 Like