Pardon the hiatus. After trying out different methods, I’ve managed to get different transformers versions to work using either:
Custom Docker with image_uri
Reference: Adapting Your Own Training Container - Amazon SageMaker
Supported version of Huggingface: Any version that you use to create your Docker image.
from sagemaker.huggingface import HuggingFace
import sagemaker
import boto3
client = boto3.client('sts')
account = client.get_caller.identit()['Account']
sess = boto3.sessions.Session()
role = sagemaker.get_execution_role()
region = sess.region_name
image_name = "huggingface-custom"
tag = "latest"
ecr_uri = f"{account}.dkr.ecr.{region}.amazonaws.com/{image_name}:{tag}"
estimator = HuggingFace(
entry_point="train.py",
source_dir="./scripts",
instance_type="ml.p3.8large",
instance_count=1
role=role,
image_uri=ecr_uri, # Custom image.
py_version="py38" # Somehow this is needed.
)
estimator.fit()
Using Sagmaker Training Compiler Containers
Reference:
- Amazon SageMaker Training Compiler - Amazon SageMaker
- amazon-sagemaker-examples/gpt-2.ipynb at main · aws/amazon-sagemaker-examples · GitHub
Using images from deep-learning-containers/available_images.md at master · aws/deep-learning-containers · GitHub
Supported version of Huggingface: 4.21.1
from sagemaker.pytorch import PyTorch, TrainingCompilerConfig
estimator = HuggingFace(
entry_point="train.py",
source_dir="./scripts",
instance_type="ml.p3.8large",
instance_count=1
role=role,
pytorch_version='1.11.0',
transformers_version="4.21",
py_version='py3',
compiler_config=TrainingCompilerConfig(), # Needed to use the Training Compiler
distribution ={'pytorchxla' : { 'enabled': True }} # To enable optimized training.
)