Cannot invoke sagemaker endpoint, keep getting OS error

Hi,

Could any experts on this topic give me a hand?

I keep getting this error:


An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{

"code": 400,

"type": "InternalServerException",

"message": "Can\u0027t load config for \u0027/.sagemaker/mms/models/model\u0027. If you were trying to load it from \u0027[https://huggingface.co/models\u0027](https://huggingface.co/models/u0027), make sure you don\u0027t have a local directory with the same name. Otherwise, make sure \u0027/.sagemaker/mms/models/model\u0027 is the correct path to a directory containing a config.json file"

}

When i go into the logs, I see:

File "/opt/conda/lib/python3.8/site-packages/transformers/file_utils.py", line 1936, in cached_path raise EnvironmentError(f"file {url_or_filename} not found"), OSError: file /.sagemaker/mms/models/model/config.json not found

I do not use Huggingface public model but used my own model trained in sagemaker notebook and stored in S3 bucket. When I deploy the endpoint, it’s successful. The problem comes only when I invoke the endpoint.

I have it working in one dev environment but when I follow the same setup for prod env, the endpoint is not working.

What I have done to debug:

  • check all roles, policies, S3 putObject, readObject,… My created role allows me to deploy it successfully so I have no idea.

  • check that model.tar.gz once unzipped contain config.json,…

  • check the network.

Background:

I finetuned a bert based model from huggingface and deployed it using sagemaker.huggingface.model import HuggingFaceModel deploy function, all works well in dev, but having the mentioned issue in prod env. I tested the deployment even with/without VPC, but it is not working yet.

Code i use:

huggingface_model = HuggingFaceModel(
model_data=model_data, # path to your trained SageMaker model
role=role, # IAM role with permissions to create an endpoint
transformers_version="4.17.0", # Transformers version used
pytorch_version="1.10.2", # PyTorch version used
py_version="py38", # Python version used
env={
"HF_TASK": "token-classification",
"SAGEMAKER_CONTAINER_LOG_LEVEL": xxx,
"SAGEMAKER_REGION": "xxx",
},
# vpc_config=vpc_config, # Specify VPC settings
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g4dn.xlarge",
)
```

how have your created the model.tar.gz? The error indicates that the structure of it is wrong. Check the documentation: Deploy models to Amazon SageMaker

1 Like

Hi,

You’re right. I went through your mentioned documentation and did exactly what it said. I later figured out that the problem is about compressing the gz file. Even though I used the same gz file for both dev and prod environments, What I had to do is unzip and zip again the same content and reupload to S3. Eventually this works.

hey @darklee, thanks that you mentioned about compression issue otherwise I wouldn’t figure out issue. I fine tuned model and was creating gz file direct from the folder and it was throwing an error at the time of inference. I converted folder into zip file and then converted into gz file and then it worked.