Inference Toolkit - Init and default template for custom inference

Hey,

Had some quick questions regarding the Inference Toolkit. Is there a way to add an init function in the custom inference.py script. I was thinking I could just add what I needed in the model_fn function but when I tried running just the basics, I got an error attached below. This leads into the second question.

Do you have a default template for the custom inference.py script. I saw that you had some documentation on GitHub - aws/sagemaker-huggingface-inference-toolkit but I was wondering if you might have an actual script we could modify to our liking.

Thanks!

# This is the script that will be used in the inference container
import os 
import json 
import torch
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

def model_fn(model_dir):
    """
    Load the model and tokenizer for inference 
    """
    device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

    tokenizer = AutoTokenizer.from_pretrained(model_dir)
    model = AutoModelForSeq2SeqLM.from_pretrained(model_dir).to(device)
    
    model_dict = {'model':model, 'tokenizer':tokenizer}
    
    return model_dict 


def predict_fn(input_data, model):
    """
    Make a prediction with the model
    """
    

    text = input_data.pop('inputs')
    parameters = input_data.pop('parameters', None)
    
    tokenizer = model['tokenizer']
    model = model['model']

    # Parameters may or may not be passed    
    input_ids = tokenizer(text, truncation=True, padding='longest', return_tensors="pt").input_ids
    output = model.generate(input_ids, **parameters) if parameters is not None else model.generate(input_ids)
    
    return tokenizer.batch_decode(output, skip_special_tokens=True)[0]


def input_fn(request_body, request_content_type):
    """
    Transform the input request to a dictionary
    """
    request = json.loads(request_body)

    return request


def output_fn(prediction, response_content_type):
    """
    Return model's prediction
    """
    return {'generated_text':prediction}

Actually sorry, I realized there were a couple mistakes above. I also found the handler_service.py. I am still running into the same error though. I added only one custom function - predict_fn - and basically copied the original predict function except for the fact that the inputs parameter is now labelled text1. It still produces the same error. For context, inference.py was put in model1.tar.gz under the folder code which is what is in the instructions. My original model, model.tar.gz without the custom inference.py is working fine. The config files are identical. The only difference between the two folders being that the most recent, model1.tar.gz contains code/inference.py

Thanks.

import os 
import json 
import torch

def predict_fn(self, data):
        """The predict handler is responsible for model predictions. Calls the `__call__` method of the provided `Pipeline`
        on decoded_input_data deserialized in input_fn. Runs prediction on GPU if is available.
        The predict handler can be overridden to implement the model inference.
        Args:
            data (dict): deserialized decoded_input_data returned by the input_fn
        Returns:
            obj (dict): prediction result.
        """

        # pop inputs for pipeline
        inputs = data.pop("text1", data)
        parameters = data.pop("parameters", None)

        # pass inputs with all kwargs in data
        if parameters is not None:
            prediction = self.model(inputs, **parameters)
        else:
            prediction = self.model(inputs)
        return prediction
from sagemaker.huggingface import HuggingFaceModel
from sagemaker.serializers import JSONSerializer
from sagemaker.deserializers import BytesDeserializer
import sagemaker

model_name = 'model1'
endpoint_name = 'endpoint1'

role = sagemaker.get_execution_role()

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data="s3://call-summarization/model1.tar.gz",  
   role=role,
   transformers_version="4.6.1", 
   pytorch_version="1.7.1",
   py_version='py36',
   name=model_name
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, 
	instance_type='ml.g4dn.xlarge',
    endpoint_name = endpoint_name, 
)

Hey @ujjirox thank you for your detailed response. I am trying to recreate it and provide an example that works.
But why are you wanting to use a customer inference.py from looking at your code it seems you are not doing something special. You should be able to deploy your model and with providing a HF_TASK:"summarization" with it.

like that and remove the inference.py from you archive.

from sagemaker.huggingface import HuggingFaceModel
import sagemaker

model_name = 'model1'
endpoint_name = 'endpoint1'

# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_TASK':'summarization'
}

role = sagemaker.get_execution_role()

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data="s3://call-summarization/model1.tar.gz",  
   role=role,
   transformers_version="4.6.1", 
   pytorch_version="1.7.1",
   env=hub,
   py_version='py36',
   name=model_name
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, 
	instance_type='ml.g4dn.xlarge',
    endpoint_name = endpoint_name, 
)

Hey Phil,

Thanks for the response. With the custom inference.py, the example above was just a test to make sure it is working before I start customizing it even further. Would the solution you suggested work when including the custom inference.py in the model? Also, if we want to modify the model loading, do we use model_fn or load_fn? I thought I saw both versions floating around.

Thanks!

Okay got it! I ll come back with a working example/steps to you.

No, it wouldn’t when providing a custom inference.py the hub config should be ignored except you are not overwriting the model_fn.

You need to use model_fn and not load_fn.

Could you test instead of providing the inference.py in the model.tar.gz to provide it dynamically when creating the endpoint?

like that

    hf_model = HuggingFaceModel(
        model_data="s3://call-summarization/model1.tar.gz",  
        role=role,
        transformers_version="4.6.1", 
        pytorch_version="1.7.1",
        source_dir="code",
        py_version="py36",
        entry_point="inference.py",
    )

For this to work the file structure needs to be

code/
     inference.py
deploy.py
1 Like

Sure, running it now. Could you elaborate on the deploy.py. I wasn’t aware this was a file we needed to have in our tar.gz. Thanks.

Edit: I think I see what you mean. Sorry about that.

With deploy.py I meant the script which is creating your endpoint. This could also be a notebook or so.

Gotchu. I figured that’s what you meant a couple minutes later.

I created an example with a inference.py included in a model.tar.gz. You can find the whole repository here: GitHub - philschmid/sample-custom-inference-sagemaker-huggingface
You can find the inference.py here: sample-custom-inference-sagemaker-huggingface/inference.py at master · philschmid/sample-custom-inference-sagemaker-huggingface · GitHub

The structure of the archive is

code/
    inference.py
pytorch_model.bin
config.json
tokenizer.json
....
1 Like

Awesome! This looks good. I am going to take a closer look tomorrow since it’s pretty late in my timezone. But hopefully it should all be good. Cheers!

SOLVED - source_dir needs to be added as argument

Hey Phil,

It looks like almost everything is working! I think this is the last issue since the changes I am making are no longer causing issues it seems. I tried to import nltk and put it in requirements.txt but it looks like it wasn’t imported correctly. The requirements.txt is sitting in the code folder along with inference.py. Is this part of an argument that needs to be included during deployment?

Currently trying to specify source_dir, will add dependencies argument if that doesn’t work and then last case resort -

os.system('pip install nltk')
import nltk

Thanks.

You can add a requirements.txt into the code/ and the archive and upload it to s3 and provide it as model_data. This should work. You can use my example to test it.
Additionally, does the FrameworkModel class have the attribute dependencies, but it looks way more complex to add your dependencies.

  • dependencies ( list [ str ] ) –A list of paths to directories (absolute or relative) with any additional libraries that will be exported to the container (default: []). The library folders will be copied to SageMaker in the same folder where the entrypoint is copied. If ‘git_config’ is provided, ‘dependencies’ should be a list of relative locations to directories with any additional libraries needed in the Git repo. If the source_dir points to S3, code will be uploaded and the S3 location will be used instead.

Example
The following call

Model(entry_point=‘inference.py’, … dependencies=[‘my/libs/common’, ‘virtual-env’])
results in the following inside the container:

$ ls
opt/ml/code
|------ inference.py
|------ common
|------ virtual-env

This is not supported with “local code” in Local Mode.

If you want to go with source_dir and entry_point I would suggest building a helper function that is executed before all imports like install_dependencies.

1 Like

sounds good. Thanks very much! Appreciate it.