Huggingface / Pytorch versions on Sagemaker

When trying to use Huggingface estimator on sagemaker, Run training on Amazon SageMaker e.g.

# create the Estimator
huggingface_estimator = HuggingFace(
        entry_point='train.py',
        source_dir='./scripts',
        instance_type='ml.p3.2xlarge',
        instance_count=1,
        role=role,
        transformers_version='4.17',
        pytorch_version='1.10',
        py_version='py38',
        hyperparameters = hyperparameters
)

When I tried to increase the version to transformers_version='4.24', it throws an error where the maximum version supported is 4.17.

Is there a page that lists the version that Sagemaker supports?

What are the possible versions for the following arguments?

  • transformers_version
  • pytorch_version
  • py_version

Also, are CPU instances supported by the Huggingface estimator on sagemaker?

Hi @alvations - this thread should hopefully answer all your questions :smiley:

1 Like

Thank you for the prompt reply!

There’s a pointer in the thread to requirements.txt for deployment when loading the model, is there some documentation to using requirements.txt when trying to use the huggingface estimator for training?

Yup, that would be in this post :wink:

@marshmellow77, I’ve tried following the post GitHub - aws/sagemaker-huggingface-inference-toolkit and tried a few combinations.

Is it right that the only way for me to use transformers version >=4.17 with a Trainer object, I’ll have to

  • first load the model in python
  • save to disk in a directory
  • add the requirements.txt, in the directory
  • load the model with the requirements.txt

In code, something like:

import torch

from datasets import load_dataset
from transformers import EncoderDecoderModel
from transformers import AutoTokenizer
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments

multibert = EncoderDecoderModel.from_encoder_decoder_pretrained(
    "bert-base-multilingual-uncased", "bert-base-multilingual-uncased"
)

multibert.save_to_disk("my-model/")

with open("my-model/code/requirements.txt", "w") as fout:
    fout.write("transformers==4.24")

Then inside my ./scripts/train-v4-24.py, load the model like:

# Instead of 
#  multibert = EncoderDecoderModel.from_encoder_decoder_pretrained(
#     "bert-base-multilingual-uncased", "bert-base-multilingual-uncased"
# )
# load the model like this

def train(): 
    multibert = EncoderDecoderModel.load_from_disk("my-model/")

    trainer = Trainer(
    model=model,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    tokenizer=tokenizer,
    )

    trainer.train()

def __main__():
    train()
    

Then in the Sagemaker, I can do this without the version:

huggingface_estimator = HuggingFace(
        entry_point='train-v4-24.py',
        source_dir='./scripts',
        instance_type='ml.p3.2xlarge',
        instance_count=1,
        role=role,
        hyperparameters = hyperparameters
)

Is the above the only way to use the Trainer object with other versions >=4.17?

Also asked on docker - How to use AWS Sagemaker with newer version of Huggingface Estimator? - Stack Overflow for wider audience