Hey @ujjirox thank you for your detailed response. I am trying to recreate it and provide an example that works.
But why are you wanting to use a customer inference.py
from looking at your code it seems you are not doing something special. You should be able to deploy your model and with providing a HF_TASK:"summarization"
with it.
like that and remove the inference.py
from you archive.
from sagemaker.huggingface import HuggingFaceModel
import sagemaker
model_name = 'model1'
endpoint_name = 'endpoint1'
# Hub Model configuration. https://huggingface.co/models
hub = {
'HF_TASK':'summarization'
}
role = sagemaker.get_execution_role()
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
model_data="s3://call-summarization/model1.tar.gz",
role=role,
transformers_version="4.6.1",
pytorch_version="1.7.1",
env=hub,
py_version='py36',
name=model_name
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type='ml.g4dn.xlarge',
endpoint_name = endpoint_name,
)