Sagemaker Serverless Inference

Hi there,

I have been trying to use the new serverless feature from Sagemaker Inference, following the different steps very well explained by @juliensimon in his video (using same Image for the container and same ServerlessConfig) to use an HuggingFace model (not fine-tuned on my side). However after having successfully created/deployed all resources (Model, EndpointConfig, Endpoint) and trying to invokeEndpoint, I encountered this error :

'Message': 'An exception occurred from internal dependency. Please contact customer support regarding request ...'

And when looking on cloudwatch, I also get this message :

python: can't open file "/usr/local/bin/deep_learning_container.py": [Errno 13] Permission denied

Error that I don’t get when using invokeEndpoint for a non-serverless Inference Endpoint.

Did someone already encounter this error ?

Thanks in advance !!

Hey @YannAgora,

Thanks for opening the thread. We encountered this error as well. See blog post

  • found limitation when testing: Currently, Transformer models > 512MB create errors

We already reported this error to the SageMaker team.

Which model are you trying to use? with which memory configuration?

Hi @philschmid,

For now I tried on tuner007/pegasus_paraphrase and Vamsi/T5_Paraphrase_Paws that are indeed two models over 512MB which confirms the limitation you found. And for the MemorySize I set the value to 6144MB which is the max value available if I’m not mistaken.

I’ll let you know when this is fixed! I hope soon!

1 Like