When trying to use Huggingface estimator on sagemaker, Run training on Amazon SageMaker e.g.
# create the Estimator
huggingface_estimator = HuggingFace(
entry_point='train.py',
source_dir='./scripts',
instance_type='ml.p3.2xlarge',
instance_count=1,
role=role,
transformers_version='4.17',
pytorch_version='1.10',
py_version='py38',
hyperparameters = hyperparameters
)
When I tried to increase the version to transformers_version='4.24'
, it throws an error where the maximum version supported is 4.17.
Is there a page that lists the version that Sagemaker supports?
What are the possible versions for the following arguments?
- transformers_version
- pytorch_version
- py_version
Also, are CPU instances supported by the Huggingface estimator on sagemaker?
Hi @alvations - this thread should hopefully answer all your questions
1 Like
Thank you for the prompt reply!
There’s a pointer in the thread to requirements.txt
for deployment when loading the model, is there some documentation to using requirements.txt
when trying to use the huggingface estimator for training?
Yup, that would be in this post
@marshmellow77, I’ve tried following the post GitHub - aws/sagemaker-huggingface-inference-toolkit and tried a few combinations.
Is it right that the only way for me to use transformers version >=4.17
with a Trainer
object, I’ll have to
- first load the model in python
- save to disk in a directory
- add the
requirements.txt
, in the directory
- load the model with the
requirements.txt
In code, something like:
import torch
from datasets import load_dataset
from transformers import EncoderDecoderModel
from transformers import AutoTokenizer
from transformers import Seq2SeqTrainer, Seq2SeqTrainingArguments
multibert = EncoderDecoderModel.from_encoder_decoder_pretrained(
"bert-base-multilingual-uncased", "bert-base-multilingual-uncased"
)
multibert.save_to_disk("my-model/")
with open("my-model/code/requirements.txt", "w") as fout:
fout.write("transformers==4.24")
Then inside my ./scripts/train-v4-24.py
, load the model like:
# Instead of
# multibert = EncoderDecoderModel.from_encoder_decoder_pretrained(
# "bert-base-multilingual-uncased", "bert-base-multilingual-uncased"
# )
# load the model like this
def train():
multibert = EncoderDecoderModel.load_from_disk("my-model/")
trainer = Trainer(
model=model,
args=training_args,
compute_metrics=compute_metrics,
train_dataset=train_dataset,
eval_dataset=test_dataset,
tokenizer=tokenizer,
)
trainer.train()
def __main__():
train()
Then in the Sagemaker, I can do this without the version:
huggingface_estimator = HuggingFace(
entry_point='train-v4-24.py',
source_dir='./scripts',
instance_type='ml.p3.2xlarge',
instance_count=1,
role=role,
hyperparameters = hyperparameters
)
Is the above the only way to use the Trainer object with other versions >=4.17
?
I published a repo that shows how you can extend the existing HF DLCs in order to use transformers
versions higher than 4.17.0.
The repo shows how to extend the Inference Container, but the same applies to the Training DLC. Hope that helps!
1 Like
Pardon the hiatus. After trying out different methods, I’ve managed to get different transformers versions to work using either:
Custom Docker with image_uri
Reference: Adapting Your Own Training Container - Amazon SageMaker
Supported version of Huggingface: Any version that you use to create your Docker image.
from sagemaker.huggingface import HuggingFace
import sagemaker
import boto3
client = boto3.client('sts')
account = client.get_caller.identit()['Account']
sess = boto3.sessions.Session()
role = sagemaker.get_execution_role()
region = sess.region_name
image_name = "huggingface-custom"
tag = "latest"
ecr_uri = f"{account}.dkr.ecr.{region}.amazonaws.com/{image_name}:{tag}"
estimator = HuggingFace(
entry_point="train.py",
source_dir="./scripts",
instance_type="ml.p3.8large",
instance_count=1
role=role,
image_uri=ecr_uri, # Custom image.
py_version="py38" # Somehow this is needed.
)
estimator.fit()
Using Sagmaker Training Compiler Containers
Reference:
Using images from deep-learning-containers/available_images.md at master · aws/deep-learning-containers · GitHub
Supported version of Huggingface: 4.21.1
from sagemaker.pytorch import PyTorch, TrainingCompilerConfig
estimator = HuggingFace(
entry_point="train.py",
source_dir="./scripts",
instance_type="ml.p3.8large",
instance_count=1
role=role,
pytorch_version='1.11.0',
transformers_version="4.21",
py_version='py3',
compiler_config=TrainingCompilerConfig(), # Needed to use the Training Compiler
distribution ={'pytorchxla' : { 'enabled': True }} # To enable optimized training.
)
Thank you for the share. Great job on the blogpost!