Error: Could Not Load Model

Working on a project that needs to deploy raw HF models without training them using SageMaker Endpoints. I clone the model repo from the HF repo, tar.gz it, load it onto S3, create my SageMaker Model, endpoint configuration, and deploy my endpoint. When I send a text payload to the endpoint, I get the below error.
I originally thought I was having the same problem as this thread: How to Create Model in SageMaker Console from .tar.gz
but the issue persists even after adding ‘Environment’: {“MMS_DEFAULT_WORKERS_PER_MODEL”: ‘1’} to my create_model call.
Any ideas on what I might be doing wrong?

SageMaker Error:
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from model with message "{
“code”: 400,
“type”: “InternalServerException”,
“message”: “Could not load model /.sagemaker/mms/models/model with any of the following classes: (\u003cclass \u0027transformers.models.auto.modeling_auto.AutoModelForMaskedLM\u0027\u003e, \u003cclass \u0027transformers.models.distilbert.modeling_distilbert.DistilBertForMaskedLM\u0027\u003e).”
}

Error from CloudWatch for the endpoint:
python: can’t open file ‘/usr/local/bin/deep_learning_container.py’: [Errno 13] Permission denied

My model definition:

huggingface_model_config = client.create_model(
ModelName = “nlp-serverless-model-” + strftime("%Y-%m-%d-%H-%M-%S", gmtime()),
ExecutionRoleArn = role,
Containers = [
{
‘Image’: ‘763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.9.1-transformers4.12.3-cpu-py38-ubuntu20.04’,
‘Mode’: ‘SingleModel’,
‘ModelDataUrl’: model_file,
‘Environment’: {“MMS_DEFAULT_WORKERS_PER_MODEL”: ‘1’}
}
],
)

Mostly means you screwed up the archive/creation of the model.tar.gz. You can find documentation and instructions here: Deploy models to Amazon SageMaker.

Are you using an inference.py script or the env config with HF_TASK? Could you also please share the code for endpoint-configuration and endpoint?

@dbounds , I had a call with AWS support a few minutes ago and we rebuild the whole config via the console. I created a little manual here:

maybe you can give it a try?

Best,
David

Thank you for the guidance, tried the directions in the link with the same result.

I am doing a git checkout of the specified model, distilber-base-uncased in this case, creating a model.tar.gx and loading that into s3 as the target model (for this use case, I am skipping training)

Model, endpoint config and endpoint code blocks below:

huggingface_model = HuggingFaceModel(
    model_data=f"s3://{s3_bucket}/{s3_prefix}/model.tar.gz",
    role=role,
    transformers_version="4.12.3",
    pytorch_version="1.9.1",
    py_version='py38',
    env={
        'HF_TASK': 'text-classification'
    },
)
huggingface_model_config = client.create_model(
    ModelName =  "nlp-serverless-model-" + strftime("%Y-%m-%d-%H-%M-%S", gmtime()),
    ExecutionRoleArn = role,
    Containers = [
        {
            'Image': '763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.9.1-transformers4.12.3-cpu-py38-ubuntu20.04',
            'Mode': 'SingleModel',
            'ModelDataUrl': model_file,
            'Environment': {"MMS_DEFAULT_WORKERS_PER_MODEL": '1'}
        }
    ],
)
endpoint_config_response = client.create_endpoint_config(
    EndpointConfigName=epc_name,
    ProductionVariants=[
        {
            'VariantName': 'single-variant',
            'ModelName': sm_model_name.split("/")[1],
            'ServerlessConfig': {
                'MemorySizeInMB': 6144,
                'MaxConcurrency': 10,
            },
        },
    ],
)
create_endpoint_response = client.create_endpoint(
    EndpointName=endpoint_name,
    EndpointConfigName=epc_name,
)

DBB: Appreciate the link. Tired again with 6144 set as MemorySizeInMb with the same result.

I’ve run into an issue like this before, how’d you tar your file? Are you using a Mac?

I removed the model objects I did not need (tried without this step as well with the same result) and created the tar from a SageMaker notebook instance:

!rm flax_model.msgpack
!rm rust_model.ot
!rm tf_model.h5
!tar zcvf model.tar.gz *

In sagemaker I use the tarfile package. Have you tried different Image versions? Maybe its a dependency issue