Inference Hyperparameters

ujjirox · August 27, 2021, 7:45pm

Hi,

I am interested in deploying a HuggingFace Model on AWS SageMaker. Let’s say for example I deploy “google/pegasus-large” on AWS. You have very generously given the code to deploy this shown below. I was wondering if, as part of the predict function we have additional arguments. I would like to incorporate a custom length penalty as well as repetition penalty. Would you be able to share where in this code these arguments would be inserted?

from sagemaker.huggingface import HuggingFaceModel
import sagemaker

role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'google/pegasus-large',
	'HF_TASK':'summarization'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.6.1',
	pytorch_version='1.7.1',
	py_version='py36',
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.m5.xlarge' # ec2 instance type
)

predictor.predict({
	'inputs': "The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct."
})

This is what the inference looks like right now in my notebook:

batch = tokenizer([source], truncation=True, padding='longest', return_tensors="pt").to('cuda')
predicted = tokenizer.batch_decode(model.generate(**batch, length_penalty = 1.5, repetition_penalty=4.), skip_special_tokens=True, )[0]

Thanks!

philschmid · August 30, 2021, 7:26am

Hey @ujjirox,

Nice to hear that you want to use Amazon SageMaker for deploying your Summarization model, and yes it is possible.
The Inference Toolkit supports the functionalities as the transformers pipelines. You can provide all of your additional prediction parameters in the parameters attribute. I attached an example below of how the request body looks like.

{
	"inputs": "Hugging Face, the winner of VentureBeat’s Innovation in Natural Language Process/Understanding Award for 2021, is looking to level the playing field. The team, launched by Clément Delangue and Julien Chaumond in 2016, was recognized for its work in democratizing NLP, the global market value for which is expected to hit $35.1 billion by 2026. This week, Google’s former head of Ethical AI Margaret Mitchell joined the team.",
	"parameters": {
		"repetition_penalty": 4.0,
		"length_penalty": 1.5
	}
}

You can find more information/parameter at Pipelines — transformers 4.10.1 documentation or at our sagemaker documentation Deploy models to Amazon SageMaker

Here is an end-to-end example for the google/pegasus-large how to deploy and use your parameters.
Note: the code snippet is scrollable

from sagemaker.huggingface import HuggingFaceModel
import sagemaker

role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'google/pegasus-large',
	'HF_TASK':'summarization'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.6.1',
	pytorch_version='1.7.1',
	py_version='py36',
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.m5.xlarge' # ec2 instance type
)

long_text= """
Hugging Face, the winner of VentureBeat’s Innovation in Natural Language Process/Understanding Award for 2021, 
is looking to level the playing field. The team, launched by Clément Delangue and Julien Chaumond in 2016,
was recognized for its work in democratizing NLP, the global market value for which is expected to 
hit $35.1 billion by 2026. This week, Google’s former head of Ethical AI Margaret Mitchell joined the team.
"""

parameters = {'repetition_penalty':4.,'length_penalty': 1.5}

predictor.predict({"inputs":long_text,"parameters":parameters})

ujjirox · August 30, 2021, 3:45pm

Awesome! Thanks a lot! This is super helpful. Cheers!

ujjirox · September 2, 2021, 4:53am

@philschmid

Hey Phil,

Quick question if you don’t mind - are there any arguments in the deployment script to input tags. I am actually getting hit up by my cloud governance team that I need to specify the tags, otherwise it triggers an alert. By tags, I am referring to - Manage tags - AWS Resource Groups and Tags

Thanks!

philschmid · September 2, 2021, 6:40am

Hey @ujjirox,

Yes, you can add tags to all related “services” for inference. Meaning in the console:

Models
Endpoint Configurations
Endpoints

To add these tags you need to add a tags key to the .deploy method, which is a list out of dictionaries with Key, Value pairs. See below for an example

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.m5.xlarge', # ec2 instance type
    tags=[{'Key':'STAGE', 'Value':'Production'},
          {'Key':'MAINTAINER','Value':'ujjrox'}]
)

This will add the tags to all 3 resources

ujjirox · September 2, 2021, 1:54pm

@philschmid Thank you very much! I’ll leave you in peace!

philschmid · September 2, 2021, 2:22pm

Your welcome! Don’t hesitate to ask if you have further questions or if your company would like to have better support! Expert Acceleration Program – Hugging Face

Oigres · September 15, 2021, 12:32pm

Hi.

I’ve been stuck here for a while and I can’t clarify myself.

How do you know what hyperparameters can be used in the pipeline? I mean:

parameters = {
                  'repetition_penalty': 4.0,
                  'length_penalty': 1.5
}

input_pipeline = {
                  "inputs":long_text,
                  "parameters":parameters
}

what other parameters are accepted in “parameters”?

I tried to look for at SummarizationPipeline class, but I don’t find such parameters, so here is where I stuck… I suppose that it will also depend of the exact model you are use, here for example: google/pegasus-large. So my question is:

For a particular model, how to get the exact list of accepted hyparparameters to be used in the pipeline?

philschmid · September 15, 2021, 1:06pm

Hello,

Thanks for the Question. To find all parameters (named) you need to dig a bit deeper into the code. For example, for the SummarizationPipeline the generate method is used under the hood. There you can find all available kwargs.

I opened a thread internally to improve documentation on this.

Oigres · September 15, 2021, 1:35pm

Okay.

Here you suggested to me to add the Truncation parameter. This is a parameter related with the tokenizer. Why it can be used as “pipeline parameter”. I’m looking for a solution like:

parameters = {
                  'truncation': True,
                  'max_length': 256,
                  'padding': True,
}

But I don’t know if the names are right or they aren’t (except Truncation, since you suggested the name). Seems that “max_length” parameter has not any effect on predictions because such predictions are exactly the same with different values.

philschmid · September 15, 2021, 3:26pm

Yes,
The Parameter would be correct, but the current released transformers only support padding, and truncation as **kwargs if you take a look here:

github.com

huggingface/transformers/blob/a5fc34437dc2cfbd9e91f732820304c017b5525d/src/transformers/pipelines/base.py#L742

    
      
                      if isinstance(model, tuple):
                          supported_models_names.extend([_model.__name__ for _model in model])
                      else:
                          supported_models_names.append(model.__name__)
                  supported_models = supported_models_names
              if self.model.__class__.__name__ not in supported_models:
                  logger.error(
                      f"The model '{self.model.__class__.__name__}' is not supported for {self.task}. Supported models are {supported_models}."
                  )
          
          
def _parse_and_tokenize(
              self, inputs, padding=True, add_special_tokens=True, truncation=TruncationStrategy.DO_NOT_TRUNCATE, **kwargs
          ):
              """
              Parse arguments and tokenize
              """
              # Parse arguments
              if getattr(self.tokenizer, "pad_token", None) is None:
                  padding = False
              inputs = self.tokenizer(
                  inputs,

But the next transformers version will allow all **tokenizers_kwargs to be passed in.

Oigres · September 15, 2021, 4:29pm

Ops! This is very bad news :(.

I have a model for classification trained on texts truncated on 256 tokens and I need to deploy it. What would you recommend me to do in such case?.

Greetings!

philschmid · September 15, 2021, 4:48pm

You don’t need to provide the parameters to run inference. It should work like a charm. Padding is not required for inference.
But if you still want to set the max_length to 256 you can modify the tokenizer_config to 256, example: tokenizer_config.json · distilbert-base-uncased-finetuned-sst-2-english at main

Oigres · September 15, 2021, 5:21pm

I don’t understand well…

why do you say I don’t need to provide the parameters on inference step? Does a model trained with texts truncated at 256 tokens perform good predictions on texts with more than 256 tokens?

On other hand, where should I put this tokenizer_config…?

philschmid · September 16, 2021, 6:15am

Not providing them for Inference will definitely increase the inference speed, since the input won’t be padded and be used as it is.
@Oigres since the underlying original model was properly pre-trained on 512 tokens it should perform decently well Yes.
You can add it into your model.tar.gz but I would do some tests in advance before deploying.

Oigres · September 16, 2021, 8:14am

Okay.

Finally I retrained the model with 512 tokens and will deploy as it.

Thank you so much.

fiona · September 24, 2021, 12:23pm

My problem is slightly different. I need to invoke the model through a Sagemaker endpoint. I tried to add the parameter and it didn’t work.

client = boto3.client(‘sagemaker-runtime’)
s = {“inputs”: input_sentence, “parameters”: {“truncation”:True}}
payload = json.dumps(s).encode(‘utf-8’)
content_type = “application/json”

endpoint_name = “huggingface-pytorch-inference-2021-09-21-18-08-15-185”
accept = “application/json”

response = client.invoke_endpoint(
EndpointName=endpoint_name,
ContentType=content_type,
Accept=accept,
Body=payload,
)

I’m getting the following error with long input -
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from model with message "{
“code”: 400,
“type”: “InternalServerException”,
“message”: “The size of tensor a (577) must match the size of tensor b (512) at non-singleton dimension 1”
}

philschmid · September 24, 2021, 1:23pm

Hello @fiona,

could please share more information like which model you have deployed, how you have created your endpoint etc.

fiona · September 24, 2021, 2:04pm

We have retrained and fine-tuned a Albert xx large classification model. We used the huggingface sagemaker to deploy the model. It works. However we got the above error with a long text.

Here is the deployment code.

task_env = {
‘TASK’:‘text-classification’,
‘HF_TASK’:‘text-classification’ # NLP task you want to use for predictions
}

create Hugging Face Model Class

huggingface_model = HuggingFaceModel(
env=task_env,
model_data=saved_model,
enable_network_isolation=True,
transformers_version=“4.6”, # transformers version used
pytorch_version=“1.7”, # pytorch version used
py_version=“py36”, # python version of the DLC
)

philschmid · September 24, 2021, 2:40pm

For me it is working I deployed a

from sagemaker.huggingface import HuggingFaceModel
import sagemaker

role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'textattack/albert-base-v2-imdb',
	'HF_TASK':'text-classification'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.6.1',
	pytorch_version='1.7.1',
	py_version='py36',
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.m5.xlarge' # ec2 instance type
)

here is how invoked it using boto3

import boto3
import json
client = boto3.client('sagemaker-runtime')
request = {
	'inputs': super_long_intput,
    "parameters": {'truncation': True}
}

response = client.invoke_endpoint(
EndpointName="huggingface-pytorch-inference-2021-09-24-14-26-23-337",
ContentType="application/json",
Accept="application/json",
Body=json.dumps(request),
)
response['Body'].read().decode()

and a screenshot of that without {'truncation': True} i get and error

Topic		Replies	Views
Predict function ignore parameters Amazon SageMaker	8	1169	January 28, 2022
How are the inputs tokenized when model deployment? Amazon SageMaker	13	4276	September 3, 2021
How to deploy a T5 model to AWS SageMaker for fast inference? Amazon SageMaker	13	5787	February 28, 2022
Deploying Open AI's whisper on Sagemaker Amazon SageMaker	54	16181	April 12, 2024
About the Amazon SageMaker category Amazon SageMaker	25	4102	August 5, 2021

Inference Hyperparameters

create Hugging Face Model Class

Related topics