Huggingface / Pytorch versions on Sagemaker

alvations · November 18, 2022, 3:01pm

When trying to use Huggingface estimator on sagemaker, Run training on Amazon SageMaker e.g.

# create the Estimator
huggingface_estimator = HuggingFace(
        entry_point='train.py',
        source_dir='./scripts',
        instance_type='ml.p3.2xlarge',
        instance_count=1,
        role=role,
        transformers_version='4.17',
        pytorch_version='1.10',
        py_version='py38',
        hyperparameters = hyperparameters
)

When I tried to increase the version to transformers_version='4.24', it throws an error where the maximum version supported is 4.17.

Is there a page that lists the version that Sagemaker supports?

What are the possible versions for the following arguments?

transformers_version
pytorch_version
py_version

Also, are CPU instances supported by the Huggingface estimator on sagemaker?

marshmellow77 · November 18, 2022, 3:25pm

Hi @alvations - this thread should hopefully answer all your questions

alvations · November 18, 2022, 5:48pm

Thank you for the prompt reply!

There’s a pointer in the thread to requirements.txt for deployment when loading the model, is there some documentation to using requirements.txt when trying to use the huggingface estimator for training?

marshmellow77 · November 18, 2022, 6:24pm

Yup, that would be in this post

alvations · November 22, 2022, 1:17am

@marshmellow77, I’ve tried following the post GitHub - aws/sagemaker-huggingface-inference-toolkit and tried a few combinations.

Is it right that the only way for me to use transformers version >=4.17 with a Trainer object, I’ll have to

first load the model in python
save to disk in a directory
add the requirements.txt, in the directory
load the model with the requirements.txt

In code, something like:

import torch

from datasets import load_dataset
from transformers import EncoderDecoderModel
from transformers import AutoTokenizer
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments

multibert = EncoderDecoderModel.from_encoder_decoder_pretrained(
    "bert-base-multilingual-uncased", "bert-base-multilingual-uncased"
)

multibert.save_to_disk("my-model/")

with open("my-model/code/requirements.txt", "w") as fout:
    fout.write("transformers==4.24")

Then inside my ./scripts/train-v4-24.py, load the model like:

# Instead of 
#  multibert = EncoderDecoderModel.from_encoder_decoder_pretrained(
#     "bert-base-multilingual-uncased", "bert-base-multilingual-uncased"
# )
# load the model like this

def train(): 
    multibert = EncoderDecoderModel.load_from_disk("my-model/")

    trainer = Trainer(
    model=model,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    tokenizer=tokenizer,
    )

    trainer.train()

def __main__():
    train()

Then in the Sagemaker, I can do this without the version:

huggingface_estimator = HuggingFace(
        entry_point='train-v4-24.py',
        source_dir='./scripts',
        instance_type='ml.p3.2xlarge',
        instance_count=1,
        role=role,
        hyperparameters = hyperparameters
)

Is the above the only way to use the Trainer object with other versions >=4.17?

alvations · November 23, 2022, 2:10pm

Also asked on docker - How to use AWS Sagemaker with newer version of Huggingface Estimator? - Stack Overflow for wider audience

marshmellow77 · November 29, 2022, 5:06pm

I published a repo that shows how you can extend the existing HF DLCs in order to use transformers versions higher than 4.17.0.

The repo shows how to extend the Inference Container, but the same applies to the Training DLC. Hope that helps!

alvations · December 16, 2022, 12:23pm

Pardon the hiatus. After trying out different methods, I’ve managed to get different transformers versions to work using either:

Custom Docker with `image_uri`

Reference: Adapting Your Own Training Container - Amazon SageMaker

Supported version of Huggingface: Any version that you use to create your Docker image.

from sagemaker.huggingface import HuggingFace
import sagemaker
import boto3

client = boto3.client('sts')
account = client.get_caller.identit()['Account']
sess = boto3.sessions.Session()
role = sagemaker.get_execution_role()

region = sess.region_name
image_name = "huggingface-custom"
tag = "latest"

ecr_uri = f"{account}.dkr.ecr.{region}.amazonaws.com/{image_name}:{tag}"

estimator = HuggingFace(
  entry_point="train.py",
  source_dir="./scripts",
  instance_type="ml.p3.8large", 
  instance_count=1
  role=role,
  image_uri=ecr_uri,  # Custom image.
  py_version="py38"  # Somehow this is needed.
)

estimator.fit()

Using Sagmaker Training Compiler Containers

Reference:

Using images from deep-learning-containers/available_images.md at master · aws/deep-learning-containers · GitHub

Supported version of Huggingface: 4.21.1

from sagemaker.pytorch import PyTorch, TrainingCompilerConfig

estimator = HuggingFace(
  entry_point="train.py",
  source_dir="./scripts",
  instance_type="ml.p3.8large", 
  instance_count=1
  role=role,
  pytorch_version='1.11.0',
  transformers_version="4.21",
  py_version='py3',                  
  compiler_config=TrainingCompilerConfig(),  # Needed to use the Training Compiler
  distribution ={'pytorchxla' : { 'enabled': True }}  # To enable optimized training.
)

marshmellow77 · December 17, 2022, 7:28am

I also meant to share this blog post I wrote around the repo I mentioned earlier: Unlock the Latest Transformer Models with Amazon SageMaker | by Heiko Hotz | Dec, 2022 | Towards Data Science

alvations · December 20, 2022, 9:41am

Thank you for the share. Great job on the blogpost!

Topic		Replies	Views
Transformers 4.6.0 on SageMaker? Amazon SageMaker	14	4577	September 9, 2022
Highest Transformer vers. that works with Sagemaker? Amazon SageMaker	2	473	September 1, 2021
How is transfomers_verion and pytorch_version determined Amazon SageMaker	1	2961	November 9, 2022
Sagemaker Huggingface Transformers support: New updates? Amazon SageMaker	1	2036	November 3, 2022
Transformer Version train vs. Sagemaker Amazon SageMaker	1	302	September 14, 2021

Huggingface / Pytorch versions on Sagemaker

Custom Docker with image_uri

Using Sagmaker Training Compiler Containers

Related topics

Custom Docker with `image_uri`