Had some quick questions regarding the Inference Toolkit. Is there a way to add an init function in the custom inference.py script. I was thinking I could just add what I needed in the model_fn function but when I tried running just the basics, I got an error attached below. This leads into the second question.
Do you have a default template for the custom inference.py script. I saw that you had some documentation on GitHub - aws/sagemaker-huggingface-inference-toolkit but I was wondering if you might have an actual script we could modify to our liking.
Thanks!
# This is the script that will be used in the inference container
import os
import json
import torch
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
def model_fn(model_dir):
"""
Load the model and tokenizer for inference
"""
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
tokenizer = AutoTokenizer.from_pretrained(model_dir)
model = AutoModelForSeq2SeqLM.from_pretrained(model_dir).to(device)
model_dict = {'model':model, 'tokenizer':tokenizer}
return model_dict
def predict_fn(input_data, model):
"""
Make a prediction with the model
"""
text = input_data.pop('inputs')
parameters = input_data.pop('parameters', None)
tokenizer = model['tokenizer']
model = model['model']
# Parameters may or may not be passed
input_ids = tokenizer(text, truncation=True, padding='longest', return_tensors="pt").input_ids
output = model.generate(input_ids, **parameters) if parameters is not None else model.generate(input_ids)
return tokenizer.batch_decode(output, skip_special_tokens=True)[0]
def input_fn(request_body, request_content_type):
"""
Transform the input request to a dictionary
"""
request = json.loads(request_body)
return request
def output_fn(prediction, response_content_type):
"""
Return model's prediction
"""
return {'generated_text':prediction}
Actually sorry, I realized there were a couple mistakes above. I also found the handler_service.py. I am still running into the same error though. I added only one custom function - predict_fn - and basically copied the original predict function except for the fact that the inputs parameter is now labelled text1. It still produces the same error. For context, inference.py was put in model1.tar.gz under the folder code which is what is in the instructions. My original model, model.tar.gz without the custom inference.py is working fine. The config files are identical. The only difference between the two folders being that the most recent, model1.tar.gz contains code/inference.py
Thanks.
import os
import json
import torch
def predict_fn(self, data):
"""The predict handler is responsible for model predictions. Calls the `__call__` method of the provided `Pipeline`
on decoded_input_data deserialized in input_fn. Runs prediction on GPU if is available.
The predict handler can be overridden to implement the model inference.
Args:
data (dict): deserialized decoded_input_data returned by the input_fn
Returns:
obj (dict): prediction result.
"""
# pop inputs for pipeline
inputs = data.pop("text1", data)
parameters = data.pop("parameters", None)
# pass inputs with all kwargs in data
if parameters is not None:
prediction = self.model(inputs, **parameters)
else:
prediction = self.model(inputs)
return prediction
from sagemaker.huggingface import HuggingFaceModel
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import BytesDeserializer
import sagemaker
model_name = 'model1'
endpoint_name = 'endpoint1'
role = sagemaker.get_execution_role()
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
model_data="s3://call-summarization/model1.tar.gz",
role=role,
transformers_version="4.6.1",
pytorch_version="1.7.1",
py_version='py36',
name=model_name
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type='ml.g4dn.xlarge',
endpoint_name = endpoint_name,
)
Hey @ujjirox thank you for your detailed response. I am trying to recreate it and provide an example that works.
But why are you wanting to use a customer inference.py from looking at your code it seems you are not doing something special. You should be able to deploy your model and with providing a HF_TASK:"summarization" with it.
like that and remove the inference.py from you archive.
from sagemaker.huggingface import HuggingFaceModel
import sagemaker
model_name = 'model1'
endpoint_name = 'endpoint1'
# Hub Model configuration. https://huggingface.co/models
hub = {
'HF_TASK':'summarization'
}
role = sagemaker.get_execution_role()
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
model_data="s3://call-summarization/model1.tar.gz",
role=role,
transformers_version="4.6.1",
pytorch_version="1.7.1",
env=hub,
py_version='py36',
name=model_name
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1,
instance_type='ml.g4dn.xlarge',
endpoint_name = endpoint_name,
)
Thanks for the response. With the custom inference.py, the example above was just a test to make sure it is working before I start customizing it even further. Would the solution you suggested work when including the custom inference.py in the model? Also, if we want to modify the model loading, do we use model_fn or load_fn? I thought I saw both versions floating around.
Awesome! This looks good. I am going to take a closer look tomorrow since itâs pretty late in my timezone. But hopefully it should all be good. Cheers!
It looks like almost everything is working! I think this is the last issue since the changes I am making are no longer causing issues it seems. I tried to import nltk and put it in requirements.txt but it looks like it wasnât imported correctly. The requirements.txt is sitting in the code folder along with inference.py. Is this part of an argument that needs to be included during deployment?
Currently trying to specify source_dir, will add dependencies argument if that doesnât work and then last case resort -
You can add a requirements.txt into the code/ and the archive and upload it to s3 and provide it as model_data. This should work. You can use my example to test it.
Additionally, does the FrameworkModel class have the attribute dependencies, but it looks way more complex to add your dependencies.
dependencies (list[str] ) âA list of paths to directories (absolute or relative) with any additional libraries that will be exported to the container (default: ). The library folders will be copied to SageMaker in the same folder where the entrypoint is copied. If âgit_configâ is provided, âdependenciesâ should be a list of relative locations to directories with any additional libraries needed in the Git repo. If the source_dir points to S3, code will be uploaded and the S3 location will be used instead.
Example
The following call
Model(entry_point=âinference.pyâ, ⌠dependencies=[âmy/libs/commonâ, âvirtual-envâ])
results in the following inside the container:
$ ls
opt/ml/code
|------ inference.py
|------ common
|------ virtual-env
This is not supported with âlocal codeâ in Local Mode.
If you want to go with source_dir and entry_point I would suggest building a helper function that is executed before all imports like install_dependencies.