Huggingface / Pytorch versions on Sagemaker

When trying to use Huggingface estimator on sagemaker, Run training on Amazon SageMaker e.g.

# create the Estimator
huggingface_estimator = HuggingFace(
        entry_point='train.py',
        source_dir='./scripts',
        instance_type='ml.p3.2xlarge',
        instance_count=1,
        role=role,
        transformers_version='4.17',
        pytorch_version='1.10',
        py_version='py38',
        hyperparameters = hyperparameters
)

When I tried to increase the version to transformers_version='4.24', it throws an error where the maximum version supported is 4.17.

Is there a page that lists the version that Sagemaker supports?

What are the possible versions for the following arguments?

  • transformers_version
  • pytorch_version
  • py_version

Also, are CPU instances supported by the Huggingface estimator on sagemaker?

Hi @alvations - this thread should hopefully answer all your questions :smiley:

1 Like

Thank you for the prompt reply!

There’s a pointer in the thread to requirements.txt for deployment when loading the model, is there some documentation to using requirements.txt when trying to use the huggingface estimator for training?

Yup, that would be in this post :wink:

@marshmellow77, I’ve tried following the post GitHub - aws/sagemaker-huggingface-inference-toolkit and tried a few combinations.

Is it right that the only way for me to use transformers version >=4.17 with a Trainer object, I’ll have to

  • first load the model in python
  • save to disk in a directory
  • add the requirements.txt, in the directory
  • load the model with the requirements.txt

In code, something like:

import torch

from datasets import load_dataset
from transformers import EncoderDecoderModel
from transformers import AutoTokenizer
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments

multibert = EncoderDecoderModel.from_encoder_decoder_pretrained(
    "bert-base-multilingual-uncased", "bert-base-multilingual-uncased"
)

multibert.save_to_disk("my-model/")

with open("my-model/code/requirements.txt", "w") as fout:
    fout.write("transformers==4.24")

Then inside my ./scripts/train-v4-24.py, load the model like:

# Instead of 
#  multibert = EncoderDecoderModel.from_encoder_decoder_pretrained(
#     "bert-base-multilingual-uncased", "bert-base-multilingual-uncased"
# )
# load the model like this

def train(): 
    multibert = EncoderDecoderModel.load_from_disk("my-model/")

    trainer = Trainer(
    model=model,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    tokenizer=tokenizer,
    )

    trainer.train()

def __main__():
    train()
    

Then in the Sagemaker, I can do this without the version:

huggingface_estimator = HuggingFace(
        entry_point='train-v4-24.py',
        source_dir='./scripts',
        instance_type='ml.p3.2xlarge',
        instance_count=1,
        role=role,
        hyperparameters = hyperparameters
)

Is the above the only way to use the Trainer object with other versions >=4.17?

Also asked on docker - How to use AWS Sagemaker with newer version of Huggingface Estimator? - Stack Overflow for wider audience

I published a repo that shows how you can extend the existing HF DLCs in order to use transformers versions higher than 4.17.0.

The repo shows how to extend the Inference Container, but the same applies to the Training DLC. Hope that helps!

1 Like

Pardon the hiatus. After trying out different methods, I’ve managed to get different transformers versions to work using either:

Custom Docker with image_uri

Reference: Adapting Your Own Training Container - Amazon SageMaker

Supported version of Huggingface: Any version that you use to create your Docker image.

from sagemaker.huggingface import HuggingFace
import sagemaker
import boto3

client = boto3.client('sts')
account = client.get_caller.identit()['Account']
sess = boto3.sessions.Session()
role = sagemaker.get_execution_role()

region = sess.region_name
image_name = "huggingface-custom"
tag = "latest"

ecr_uri = f"{account}.dkr.ecr.{region}.amazonaws.com/{image_name}:{tag}"

estimator = HuggingFace(
  entry_point="train.py",
  source_dir="./scripts",
  instance_type="ml.p3.8large", 
  instance_count=1
  role=role,
  image_uri=ecr_uri,  # Custom image.
  py_version="py38"  # Somehow this is needed.
)

estimator.fit()

Using Sagmaker Training Compiler Containers

Reference:

Using images from deep-learning-containers/available_images.md at master · aws/deep-learning-containers · GitHub

Supported version of Huggingface: 4.21.1

from sagemaker.pytorch import PyTorch, TrainingCompilerConfig

estimator = HuggingFace(
  entry_point="train.py",
  source_dir="./scripts",
  instance_type="ml.p3.8large", 
  instance_count=1
  role=role,
  pytorch_version='1.11.0',
  transformers_version="4.21",
  py_version='py3',                  
  compiler_config=TrainingCompilerConfig(),  # Needed to use the Training Compiler
  distribution ={'pytorchxla' : { 'enabled': True }}  # To enable optimized training.
)

I also meant to share this blog post I wrote around the repo I mentioned earlier: Unlock the Latest Transformer Models with Amazon SageMaker | by Heiko Hotz | Dec, 2022 | Towards Data Science

1 Like

Thank you for the share. Great job on the blogpost!