I am looking to fine tune GPT-J using amazon sagemaker in my local environment. I have been following the tutorials and documentation https://huggingface.co/docs/sagemaker/getting-started and here https://huggingface.co/docs/sagemaker/inference#deploy-with-model_data. I have my own training dataset that is stored in S3 but I am running errors due to IAM roles permissions. There is very little documentation covering what actual permissions are required to train Hugging Face training model using Sagemaker.
If anyone knows what IAM role permissions are required to train a Hugging face model that would be great!
The one most relevant to your use case is the CreateTrainingJob API (CreateTrainingJob - Amazon SageMaker) that requires the following permissions:
sagemaker:CreateTrainingJob
iam:PassRole
kms:CreateGrant (required only if the associated ResourceConfig has a specified VolumeKmsKeyId and the associated role does not have a policy that permits this action)
To allow the Training Job access data in the S3 Bucket, the following policy should work,