Huggingface Training Containers

jyothilolla · March 15, 2024, 6:13am

I’m performing Domain Adaptation using the below code…

huggingface_estimator = HuggingFace(
entry_point = ‘Domain.py’, # train script
source_dir = ‘scripts’, # directory which includes all the files needed for training
instance_type = ‘ml.g5.2xlarge’, # instances type used for the training job
instance_count = 1, # the number of instances used for training
base_job_name = job_name, # the name of the training job
role = role, # Iam role used in training job to access AWS ressources, e.g. S3
# checkpoint_s3_uri = f’s3://{sess.default_bucket()}/checkpoints’,
volume_size = 300, # the size of the EBS volume in GB
transformers_version = ‘4.28’, # the transformers version used in the training job
pytorch_version = ‘2.0’, # the pytorch_version version used in the training job
py_version = ‘py310’, # the python version used in the training job
hyperparameters = hyperparameters, # the hyperparameters passed to the training job
environment = { “HUGGINGFACE_HUB_CACHE”: “/tmp/.cache” },)

data = {‘training’: training_input_path}
huggingface_estimator.fit(data, wait=True)

But I’m getting error while training and figured out that was because of python libraries version issues. I need to train using transformers_version=‘4.36’, and pytorch_version=‘2.1’.

When I use them, I’m getting following error:

ValueError: Unsupported huggingface version: 4.36. You may need to upgrade your SDK version (pip install -U sagemaker) for newer huggingface versions. Supported huggingface version(s): 4.4.2, 4.5.0, 4.6.1, 4.10.2, 4.11.0, 4.12.3, 4.17.0, 4.26.0, 4.28.1, 4.4, 4.5, 4.6, 4.10, 4.11, 4.12, 4.17, 4.26, 4.28.

But I can see AWS DLCs with those versions here, deep-learning-containers/available_images.md at master · aws/deep-learning-containers · GitHub

Training Image I need: 763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-training:2.1.0-transformers4.36.0-gpu-py310-cu121-ubuntu20.04

How to implement this in my code?
Please help. Thanks in Advance!

Topic		Replies	Views
Huggingface / Pytorch versions on Sagemaker Amazon SageMaker	9	4380	December 20, 2022
Training on Sagemaker with Trainer() Instance Amazon SageMaker	6	2279	November 3, 2021
Package errors running huggingface estimator on sagemaker Beginners	1	937	February 9, 2023
Using huggingface transformers trainer method for hugging face datasets 🤗Datasets	1	1097	April 15, 2024
Sagemaker gpt-j train file error Amazon SageMaker	27	2908	February 22, 2024

Huggingface Training Containers

Related topics