Hi Alex
The way you instantiate the HuggingFaceModel
class looks a bit unusual to me. I usually go about it this way:
huggingface_model = HuggingFaceModel(
model_data="s3://hf-sagemaker-inference/model.tar.gz", # path to your trained sagemaker model
role=role, # iam role with permissions to create an Endpoint
transformers_version="4.17", # transformers version used
pytorch_version="1.10", # pytorch version used
py_version="py38", # python version of the DLC
)
In terms of serverless deployment, you seem to do everything right, as far as I can tell. Just make sure you use the latest DLC (i.e. specify the latest supported Transformers and Pytorch versions, 4.17 and 1.10 respectively, in this case). You can find the latest versions here: Reference
You can also check out these two sample notebooks and mix and match to fit your use case:
Hope that helps!
Cheers
Heiko