Hi,
Could any experts on this topic give me a hand?
I keep getting this error:
An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from primary with message "{
"code": 400,
"type": "InternalServerException",
"message": "Can\u0027t load config for \u0027/.sagemaker/mms/models/model\u0027. If you were trying to load it from \u0027[https://huggingface.co/models\u0027](https://huggingface.co/models/u0027), make sure you don\u0027t have a local directory with the same name. Otherwise, make sure \u0027/.sagemaker/mms/models/model\u0027 is the correct path to a directory containing a config.json file"
}
When i go into the logs, I see:
File "/opt/conda/lib/python3.8/site-packages/transformers/file_utils.py", line 1936, in cached_path raise EnvironmentError(f"file {url_or_filename} not found"), OSError: file /.sagemaker/mms/models/model/config.json not found
I do not use Huggingface public model but used my own model trained in sagemaker notebook and stored in S3 bucket. When I deploy the endpoint, it’s successful. The problem comes only when I invoke the endpoint.
I have it working in one dev environment but when I follow the same setup for prod env, the endpoint is not working.
What I have done to debug:
-
check all roles, policies, S3 putObject, readObject,… My created role allows me to deploy it successfully so I have no idea.
-
check that model.tar.gz once unzipped contain config.json,…
-
check the network.
Background:
I finetuned a bert based model from huggingface and deployed it using sagemaker.huggingface.model import HuggingFaceModel deploy function, all works well in dev, but having the mentioned issue in prod env. I tested the deployment even with/without VPC, but it is not working yet.
Code i use:
huggingface_model = HuggingFaceModel(
model_data=model_data, # path to your trained SageMaker model
role=role, # IAM role with permissions to create an endpoint
transformers_version="4.17.0", # Transformers version used
pytorch_version="1.10.2", # PyTorch version used
py_version="py38", # Python version used
env={
"HF_TASK": "token-classification",
"SAGEMAKER_CONTAINER_LOG_LEVEL": xxx,
"SAGEMAKER_REGION": "xxx",
},
# vpc_config=vpc_config, # Specify VPC settings
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type="ml.g4dn.xlarge",
)
```