Okay got it! I ll come back with a working example/steps to you.
No, it wouldn’t when providing a custom inference.py
the hub config should be ignored except you are not overwriting the model_fn
.
You need to use model_fn
and not load_fn
.
Could you test instead of providing the inference.py
in the model.tar.gz
to provide it dynamically when creating the endpoint?
like that
hf_model = HuggingFaceModel(
model_data="s3://call-summarization/model1.tar.gz",
role=role,
transformers_version="4.6.1",
pytorch_version="1.7.1",
source_dir="code",
py_version="py36",
entry_point="inference.py",
)
For this to work the file structure needs to be
code/
inference.py
deploy.py