Hi,
I saw that there is a difference about how to get arguments between script in
-
Prepare a Transformers fine-tuning script that uses
argparse.ArgumentParser()
andparser.add_argument()
- and the ones in transformers/examples/pytorch/ that uses
HfArgumentParser()
andparser.parse_args_into_dataclasses()
but I need some explanation.
Script in “Prepare a Transformers fine-tuning script”
SageMaker doesn’t support argparse actions: what does it means?
The `hyperparameters` defined in the [Hugging Face Estimator](https://huggingface.co/docs/sagemaker/train#create-an-huggingface-estimator)
are passed as named arguments and processed by `ArgumentParser()` .
import transformers
import datasets
import argparse
import os
if __name__ == "__main__":
parser = argparse.ArgumentParser()
# hyperparameters sent by the client are passed as command-line arguments to the script
parser.add_argument("--epochs", type=int, default=3)
parser.add_argument("--per_device_train_batch_size", type=int, default=32)
parser.add_argument("--model_name_or_path", type=str)
Note that SageMaker doesn’t support argparse actions.
For example, if you want to use a boolean hyperparameter,
specify type as bool in your script and provide an explicit True or False value.
Script in transformers/examples/pytorch
For example, in the script run_ner.py, the formulation is different.
from transformers import (
(...),
HfArgumentParser,
(...)
)
(...)
@dataclass
class ModelArguments:
"""
Arguments pertaining to which model/config/tokenizer we are going to fine-tune from.
"""
model_name_or_path: str = field(
metadata={"help": "Path to pretrained model or model identifier from huggingface.co/models"}
)
config_name: Optional[str] = field(
default=None, metadata={"help": "Pretrained config name or path if not the same as model_name"}
)
(...)
def main():
# See all possible arguments in src/transformers/training_args.py
# or by passing the --help flag to this script.
# We now keep distinct sets of args, for a cleaner separation of concerns.
parser = HfArgumentParser((ModelArguments, DataTrainingArguments, TrainingArguments))
if len(sys.argv) == 2 and sys.argv[1].endswith(".json"):
# If we pass only one argument to the script and it's the path to a json file,
# let's parse it to get our arguments.
model_args, data_args, training_args = parser.parse_json_file(json_file=os.path.abspath(sys.argv[1]))
else:
model_args, data_args, training_args = parser.parse_args_into_dataclasses()
Someone could explain the differences and if in SageMaker, we must rephrase the arguments section of the scripts in transformers/examples/pytorch/ as formulated in Prepare a Transformers fine-tuning script or not? Thanks.