How to quickly change the inferece.py for an endpoint on AWS SagemMaker

I wondered how to change the inference script of a deploy hugging face model.

Unfortunately this is not really possible, at least to my knowledge.

What you could do instead is to use local mode in SageMaker. Instead of deploying the model to a real-time endpoint you can “simulate” the deployment on the local machine (e.g. a Notebook instance). This way you can quickly test the deployment and change it as needed. Once the tests are successful you can deploy to the actual endpoint.

Here is an example how to do that with a HF model. More info on the Github page.

Hope that helps.

Cheers
Heiko