Working on a project that needs to deploy raw HF models without training them using SageMaker Endpoints. I clone the model repo from the HF repo, tar.gz it, load it onto S3, create my SageMaker Model, endpoint configuration, and deploy my endpoint. When I send a text payload to the endpoint, I get the below error.
I originally thought I was having the same problem as this thread: How to Create Model in SageMaker Console from .tar.gz
but the issue persists even after adding ‘Environment’: {“MMS_DEFAULT_WORKERS_PER_MODEL”: ‘1’} to my create_model call.
Any ideas on what I might be doing wrong?
SageMaker Error:
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from model with message "{
“code”: 400,
“type”: “InternalServerException”,
“message”: “Could not load model /.sagemaker/mms/models/model with any of the following classes: (\u003cclass \u0027transformers.models.auto.modeling_auto.AutoModelForMaskedLM\u0027\u003e, \u003cclass \u0027transformers.models.distilbert.modeling_distilbert.DistilBertForMaskedLM\u0027\u003e).”
}
Error from CloudWatch for the endpoint:
python: can’t open file ‘/usr/local/bin/deep_learning_container.py’: [Errno 13] Permission denied
My model definition:
huggingface_model_config = client.create_model(
ModelName = “nlp-serverless-model-” + strftime("%Y-%m-%d-%H-%M-%S", gmtime()),
ExecutionRoleArn = role,
Containers = [
{
‘Image’: ‘763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.9.1-transformers4.12.3-cpu-py38-ubuntu20.04’,
‘Mode’: ‘SingleModel’,
‘ModelDataUrl’: model_file,
‘Environment’: {“MMS_DEFAULT_WORKERS_PER_MODEL”: ‘1’}
}
],
)