Help for inference.py code

GenV · February 25, 2022, 5:10pm

Hi,
I’m using the SageMaker / Huggingface inference. For the model.tar.gz requested for the endpoint, I’m using this inference code:

import os
import torch
from transformers import AutoTokenizer, pipeline, T5Tokenizer

T5_WEIGHTS_NAME = "t5.pt"


def model_fn(model_dir):
    model = torch.load(os.path.join(model_dir, T5_WEIGHTS_NAME))
    tokenizer = T5Tokenizer.from_pretrained(model_dir)

    if torch.cuda.is_available():
        device = 0
    else:
        device = -1

    generation = pipeline("text2text-generation", model=model, tokenizer=tokenizer, device=device, max_length=1024)

    return generation

I have worse performance than my local code like this:

tokenized_text = self.tokenizer_nl(
                input_text, truncation=True, padding="max_length", return_tensors="pt"
            )

source_ids = tokenized_text["input_ids"].to(self.device, dtype=torch.long)
source_mask = tokenized_text["attention_mask"].to(self.device, dtype=torch.long)

generated_ids = self.model.generate(
    input_ids=source_ids, attention_mask=source_mask, max_length=1024
)

pred = self.tokenizer_sql.decode(
    generated_ids[0],
    spaces_between_special_tokens=False,
    skip_special_tokens=True,
)

So I want to put this code in my inference.py code, without using the pipeline. But I don’t know how to write this inference. Can someone help me? Thanks!

philschmid · February 25, 2022, 5:34pm

Hello @GenV,

Which version are you using on SageMaker? and which version are you using locally? For transformers and pytorch.
Do you use a GPU on your local machine as well?

GenV · February 25, 2022, 7:22pm

@philschmid I’m using on SageMaker: torch 1.9.1, transformers 4.12.3 and sagemaker 2.77.1

Locally I’m using torch 1.10.2, transformers 4.12.5

Both on GPU.

philschmid · February 28, 2022, 2:56pm

And what is the latency difference you see? Could test to have the same versions locally as well?

GenV · February 28, 2022, 5:01pm

@philschmid There is a difference in latency because I have two different GPUs in local / remote mode, but it is not significant (it is very low). Using the same versions I have differences using the pipeline () method. I have also a difference in using a .pt or .bin model. I have now switched to the .bin model because I have fewer errors than .pt.

philschmid · March 1, 2022, 1:26pm

@GenV I might have misunderstood your question. Sorry
Since you have no switch to the pytorch_model.bin you should be able to deploy without the need to create a inference.py and just provide the env variables when creating the endpoint similar to the snippet below.

from sagemaker.huggingface import HuggingFaceModel
import sagemaker

role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'t5-base',
	'HF_TASK':'text2text-generation'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.12',
	pytorch_version='1.9',
	py_version='py38',
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.g4dn.xlarge' # ec2 instance type
)

predictor.predict({
	'inputs': "Меня зовут Вольфганг и я живу в Берлине"
})

You can find more information in the documentation: Deploy models to Amazon SageMaker

GenV · March 1, 2022, 2:27pm

@philschmid thank you for the answer. I’m using this code, but with my own model. So my code is:

role = sagemaker.get_execution_role()

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   model_data=model_data_url,  # path to your trained sagemaker model
   role=role, # iam role with permissions to create an Endpoint
   transformers_version=transformers_version, # "4.12.3"
   pytorch_version=pytorch_version, #"1.9.1"
   py_version=py_version # "py38"
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
   initial_instance_count=1,
   instance_type="ml.p3.2xlarge"
)

My question is how to create my own inference.py, (or how to implement the model_fn and transform_fn methods, because I don’t want to use the pipeline() method but my implementation)

philschmid · March 1, 2022, 2:38pm

You can take a look here: Deploy models to Amazon SageMaker

GenV · March 1, 2022, 2:52pm

Thank you. I don’t see examples/implementations. Are there purely practical examples from which to get ideas for creation?

philschmid · March 7, 2022, 8:03am

@GenV i created an example on how to do it: notebooks/sagemaker-notebook.ipynb at master · huggingface/notebooks · GitHub

GenV · March 8, 2022, 9:34am

@philschmid Thank you! It is very useful.

Topic		Replies	Views
Inference Toolkit - Init and default template for custom inference Amazon SageMaker	12	2115	October 4, 2021
Infer on sagemaker with custom pipeline Amazon SageMaker	2	498	September 14, 2023
Inference Toolkit - custom inference with multiple models Amazon SageMaker	1	628	April 4, 2024
Sagemaker Endpoint Not Using GPU for PygmalionAI Amazon SageMaker	7	1774	April 18, 2024
ClientErro:400 when using batch transformer for inference Amazon SageMaker	11	2216	January 13, 2022

Help for inference.py code

Related topics