Running out of memory with all except the basic GPT2 and GPT neo models on sagemaker127M

How do I set the max_split_size_mb or gradient_checkpointing or f16 parameters in the HuggingFace Constructor?

My code is as follows:

from sagemaker.huggingface import HuggingFace

role = sagemaker.get_execution_role()

hyper_params = {
‘model_name_or_path’ : gpt_model,
‘output_dir’ : ‘/opt/ml/model’,
‘do_train’ : True,
‘train_file’ : ‘/opt/ml/input/data/train/{}’.format(training_file_name),

'num_train_epochs' : 5,
'per_device_train_batch_size' : 10,

}

git_config = { ‘repo’ : ‘GitHub - huggingface/transformers: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.’, ‘branch’: ‘v4.17.0’ }

huggingface_estimator = HuggingFace(
entry_point=‘run_clm.py’,
source_dir=‘./examples/pytorch/language-modeling’,
instance_type=‘ml.g4dn.2xlarge’,
env = { ‘max_split_mb_size’ : 512 },
instance_count=1,
role=role,
git_config=git_config,
transformers_version=‘4.17.0’,
pytorch_version=‘1.10.2’,
py_version=‘py38’,
hyperparameters = hyper_params,
gradient_checkpointing=True,
fp16=True,
)

huggingface_estimator.fit({‘train’: s3_training_data}, wait = True)