I’m deploying some inference endpoints on Sagemaker using the HuggingFace Inference Toolkit, and am overriding some of the default methods (model_fn and predict_fn) as described here:
This works, but the problem is that when I want to test changes to my inference.py, it is very slow and cumbersome to create and upload a new model.tar.gz file that includes the new inference.py script.
Is it possible to provide the inference.py script separately from the compressed model archive?
It is possible to give only the inference.py script if you have your model.tar.gz in s3. Not sure if it works with hub.
When you create the HuggingFaceModel() object, give it source dir (local folder where inference.py script is), entry point (inference.py) and model_data (s3 url).
Then next time you do HuggingFaceModel.deploy() it will use the inference script from your local folder and the model from s3.
Where should the inference.py file be inside the model tar ball? Should it be inside the subdirectory “code” or should it be immediately inside the tar ball as a file?