Huggingface / Pytorch versions on Sagemaker

Pardon the hiatus. After trying out different methods, I’ve managed to get different transformers versions to work using either:

Custom Docker with image_uri

Reference: Adapting Your Own Training Container - Amazon SageMaker

Supported version of Huggingface: Any version that you use to create your Docker image.

from sagemaker.huggingface import HuggingFace
import sagemaker
import boto3

client = boto3.client('sts')
account = client.get_caller.identit()['Account']
sess = boto3.sessions.Session()
role = sagemaker.get_execution_role()

region = sess.region_name
image_name = "huggingface-custom"
tag = "latest"

ecr_uri = f"{account}.dkr.ecr.{region}.amazonaws.com/{image_name}:{tag}"

estimator = HuggingFace(
  entry_point="train.py",
  source_dir="./scripts",
  instance_type="ml.p3.8large", 
  instance_count=1
  role=role,
  image_uri=ecr_uri,  # Custom image.
  py_version="py38"  # Somehow this is needed.
)

estimator.fit()

Using Sagmaker Training Compiler Containers

Reference:

Using images from deep-learning-containers/available_images.md at master · aws/deep-learning-containers · GitHub

Supported version of Huggingface: 4.21.1

from sagemaker.pytorch import PyTorch, TrainingCompilerConfig

estimator = HuggingFace(
  entry_point="train.py",
  source_dir="./scripts",
  instance_type="ml.p3.8large", 
  instance_count=1
  role=role,
  pytorch_version='1.11.0',
  transformers_version="4.21",
  py_version='py3',                  
  compiler_config=TrainingCompilerConfig(),  # Needed to use the Training Compiler
  distribution ={'pytorchxla' : { 'enabled': True }}  # To enable optimized training.
)
1 Like