I’m deploying some inference endpoints on Sagemaker using the HuggingFace Inference Toolkit, and am overriding some of the default methods (model_fn and predict_fn) as described here:
This works, but the problem is that when I want to test changes to my
inference.py, it is very slow and cumbersome to create and upload a new
model.tar.gz file that includes the new
Is it possible to provide the
inference.py script separately from the compressed model archive?
No, there is sadly not way except that you can test your inference.py locally or load the model in the model_fn
It is possible to give only the inference.py script if you have your model.tar.gz in s3. Not sure if it works with hub.
When you create the HuggingFaceModel() object, give it source dir (local folder where inference.py script is), entry point (inference.py) and model_data (s3 url).
Then next time you do HuggingFaceModel.deploy() it will use the inference script from your local folder and the model from s3.
Thats true but behind the scenes a new
model.tar.gz is built and used.
Where should the inference.py file be inside the model tar ball? Should it be inside the subdirectory “code” or should it be immediately inside the tar ball as a file?