Now I would like to deploy it to Sagemaker Serverless Inference. As far as I understand, the next step would be to create a .tar.gz file of the model and upload it to S3. So, is it sufficient to simply create a .tar.gz file from the folder mentioned above or do I have to pay attention to something else?
you also need to save your tokenizer.save_pretrained(). And then you can create a model.tar.gz following the steps in the documentation: Deploy models to Amazon SageMaker
thanks for your advice! I added the tokenizer files to the model folder and created one big tar.gz.
After that I uploaded the file to S3 and created a model in the sagemaker console. I assigned the following container image to the model 763104351884.dkr.ecr.eu-west-1.amazonaws.com/huggingface-pytorch-inference:1.9.1-transformers4.12.3-cpu-py38-ubuntu20.04
I successfully created an endpoint configuration and the endpoint but when I try to invoke the endpoint via postman I get the following error:
{"ErrorCode":"INTERNAL_DEPENDENCY_EXCEPTION","Message":"An exception occurred from internal dependency. Please contact customer support regarding request XXXXXXXXXX."}
Do you have an idea how to debug that? The error message isn’t really helpful. (I also opened a case with aws support)
Don’t need to do this manually, deploying the model you can use the Python SageMaker SDK with the HuggingFaceModel an just point to your S3 model.tar.gz, which will handle all of the creation. It looks like you have an issue will creating the resources.
I spoke to AWS support and they told me that I should set the RAM of the container runtime to 3 GB. With 3GB assigned I the endpoint returns:
{"ErrorCode":"INTERNAL_FAILURE_FROM_MODEL","LogStreamArn":null,"Message":"Received server error (0) from model with message \"Amazon SageMaker could not get a response from the gottbert-job-class-endpoint endpoint.\". ..........
I checked the CloudWatchLogs and there I see:
python: can't open file '/usr/local/bin/deep_learning_container.py': [Errno 13] Permission denied
It seems that the issue is comparable to this discussion… ?
@philschmid@YannAgora , thanks for your help! I just want to give you a short update regarding the case and maybe the following is also relevant for everyone who encounters the problem or wants to set up serverless inference via the SageMaker console
TL;DR
Endpoint returns prediction
BUT error still visible in logs python: can't open file '/usr/local/bin/deep_learning_container.py': [Errno 13] Permission denied
Re-build the endpoint configuration with max. ram (6GB) magically resolved the error
AWS support is aware of the premission issue and still investigating a fix
I had a call with AWS Support and showed them the issue. We manually deleted the endpoint, the endpoint configuration and the model.
Then we did the following via the aws sagemaker console manually:
Create model from .tar.giz file on S3 || Amazon SageMaker → Models → Create model
1.1 Make sure assigned IAM Role has AmazonSageMakerFullAccess IAM policy attached.
1.2 Select “Provide model artifacts and inference image location”
1.3 Select “Use a single model”
1.4 Location of inference code image. We used a CPU-only pre-defined HF AWS Image for inference. Replace the region with eu-west-1, like 763104351884.dkr.ecr.eu-west-1.amazonaws.com/huggingface-pytorch-inference:1.9.1-transformers4.12.3-cpu-py38-ubuntu20.04
1.5 Location of model artifacts: Copy S3 URI to the .tar.gz.
1.6 We left all other settings untouched and saved the model.
Create endpoint configuration || Amazon SageMaker → Endpoint configuration → Create endpoint configuration
2.1 Type of endpoint: Serverless
2.1 Production variants → Click “Add Model”, select the model you created during step 1 and save.
2.2 Back in the main setup, click “Edit” next to the selected model and assing 6GB of Memory Size to the model (I also set Max Concurrency to 1 but not sure about that), hit save
2.3 Save the endpoint config by clicking “Create endpoint configuration”
Create endpoint || Amazon SageMaker → Endpoints → Create and configure endpoint
3.1 Name the endpoint
3.2 Select “Use an existing endpoint configuration”
3.3 Select the endpoint configuration and click “Select endpoint configuration”
3.3 Click “Create endpoint”
It will take a few minutes to create the endpoint. After that you can open the endpoint and copy the invocation url which looks like https://runtime.sagemaker.eu-west-1.amazonaws.com/endpoints/endpoint-name/invocations. Then you can send POST requests to the endpoint. The body of the requests has the format {"inputs":"Your Text"} and the endpoint will return something like: