SageMaker doesn’t support argparse actions

pierreguillou · December 2, 2021, 4:11pm

Hi,

I saw that there is a difference about how to get arguments between script in

Prepare a Transformers fine-tuning script that uses argparse.ArgumentParser() and parser.add_argument()
and the ones in transformers/examples/pytorch/ that uses HfArgumentParser() and parser.parse_args_into_dataclasses()

but I need some explanation.

Script in “Prepare a Transformers fine-tuning script”

SageMaker doesn’t support argparse actions: what does it means?

The `hyperparameters` defined in the [Hugging Face Estimator](https://huggingface.co/docs/sagemaker/train#create-an-huggingface-estimator)
are passed as named arguments and processed by `ArgumentParser()` .

import transformers
import datasets
import argparse
import os

if __name__ == "__main__":

    parser = argparse.ArgumentParser()

    # hyperparameters sent by the client are passed as command-line arguments to the script
    parser.add_argument("--epochs", type=int, default=3)
    parser.add_argument("--per_device_train_batch_size", type=int, default=32)
    parser.add_argument("--model_name_or_path", type=str)

Note that SageMaker doesn’t support argparse actions. 
For example, if you want to use a boolean hyperparameter, 
specify type as bool in your script and provide an explicit True or False value.

Script in transformers/examples/pytorch

For example, in the script run_ner.py, the formulation is different.

from transformers import (
    (...),
    HfArgumentParser,
    (...)
)

(...)

@dataclass
class ModelArguments:
    """
    Arguments pertaining to which model/config/tokenizer we are going to fine-tune from.
    """

    model_name_or_path: str = field(
        metadata={"help": "Path to pretrained model or model identifier from huggingface.co/models"}
    )
    config_name: Optional[str] = field(
        default=None, metadata={"help": "Pretrained config name or path if not the same as model_name"}
    )

  (...)

def main():
    # See all possible arguments in src/transformers/training_args.py
    # or by passing the --help flag to this script.
    # We now keep distinct sets of args, for a cleaner separation of concerns.

    parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))
    if len(sys.argv) == 2 and sys.argv[1].endswith(".json"):
        # If we pass only one argument to the script and it's the path to a json file,
        # let's parse it to get our arguments.
        model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
    else:
        model_args, data_args, training_args = parser.parse_args_into_dataclasses()

Someone could explain the differences and if in SageMaker, we must rephrase the arguments section of the scripts in transformers/examples/pytorch/ as formulated in Prepare a Transformers fine-tuning script or not? Thanks.

philschmid · December 3, 2021, 8:00am

This means you cannot use parser.add_argument("--args", action="store_true")

The HfArgumentParser is a custom on top implementation on the argsparser to make it easy create python scripts for Transformers. You can use the HfArgumentParser if you want and feel confident, for example with the HfArgumentParser you don’t need to define the TrainingArguments as parser.add_argument since they are added behind the scenes.
For the SageMaker examples we went with the default argsparser since it is easier and faster to get started for non Transformers experts and it might have been difficult to understand that you don’t need to define per_device_train_batch_size in the train.py but can use it as hyperparameter in the notebook.

Someone could explain the differences and if in SageMaker, we must rephrase the arguments section of the scripts in transformers/examples/pytorch/ as formulated in Prepare a Transformers fine-tuning script or not?

No you don’t need to rephrase them, since the HfArgumentParser is creating add_argument behind the scenes it works with SageMaker. So you can decide how you would like to structure your script

Topic		Replies	Views
Inference Hyperparameters Amazon SageMaker	29	4829	October 8, 2021
Infer on sagemaker with custom pipeline Amazon SageMaker	2	498	September 14, 2023
About the Amazon SageMaker category Amazon SageMaker	25	4102	August 5, 2021
Train end-to-end text classication on sagemaker Amazon SageMaker	5	532	October 11, 2021
Sagemaker huggingface estimator tries to import tensorflow when pytorch is defined 🤗Transformers	0	441	August 19, 2022

SageMaker doesn’t support argparse actions

Script in “Prepare a Transformers fine-tuning script”

Script in transformers/examples/pytorch

Related topics