Cannot invoke sagemaker endpoint, keep getting OS error


Could any experts on this topic give me a hand?

I keep getting this error:

An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{

"code": 400,

"type": "InternalServerException",

"message": "Can\u0027t load config for \u0027/.sagemaker/mms/models/model\u0027. If you were trying to load it from \u0027[\u0027](, make sure you don\u0027t have a local directory with the same name. Otherwise, make sure \u0027/.sagemaker/mms/models/model\u0027 is the correct path to a directory containing a config.json file"


When i go into the logs, I see:

File "/opt/conda/lib/python3.8/site-packages/transformers/", line 1936, in cached_path raise EnvironmentError(f"file {url_or_filename} not found"), OSError: file /.sagemaker/mms/models/model/config.json not found

I do not use Huggingface public model but used my own model trained in sagemaker notebook and stored in S3 bucket. When I deploy the endpoint, it’s successful. The problem comes only when I invoke the endpoint.

I have it working in one dev environment but when I follow the same setup for prod env, the endpoint is not working.

What I have done to debug:

  • check all roles, policies, S3 putObject, readObject,… My created role allows me to deploy it successfully so I have no idea.

  • check that model.tar.gz once unzipped contain config.json,…

  • check the network.


I finetuned a bert based model from huggingface and deployed it using sagemaker.huggingface.model import HuggingFaceModel deploy function, all works well in dev, but having the mentioned issue in prod env. I tested the deployment even with/without VPC, but it is not working yet.

Code i use:

huggingface_model = HuggingFaceModel(
model_data=model_data, # path to your trained SageMaker model
role=role, # IAM role with permissions to create an endpoint
transformers_version="4.17.0", # Transformers version used
pytorch_version="1.10.2", # PyTorch version used
py_version="py38", # Python version used
"HF_TASK": "token-classification",
# vpc_config=vpc_config, # Specify VPC settings
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(

how have your created the model.tar.gz? The error indicates that the structure of it is wrong. Check the documentation: Deploy models to Amazon SageMaker


You’re right. I went through your mentioned documentation and did exactly what it said. I later figured out that the problem is about compressing the gz file. Even though I used the same gz file for both dev and prod environments, What I had to do is unzip and zip again the same content and reupload to S3. Eventually this works.