Inference Hyperparameters

Hi,

I am interested in deploying a HuggingFace Model on AWS SageMaker. Let’s say for example I deploy “google/pegasus-large” on AWS. You have very generously given the code to deploy this shown below. I was wondering if, as part of the predict function we have additional arguments. I would like to incorporate a custom length penalty as well as repetition penalty. Would you be able to share where in this code these arguments would be inserted?

from sagemaker.huggingface import HuggingFaceModel
import sagemaker

role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'google/pegasus-large',
	'HF_TASK':'summarization'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.6.1',
	pytorch_version='1.7.1',
	py_version='py36',
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.m5.xlarge' # ec2 instance type
)

predictor.predict({
	'inputs': "The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct."
})

This is what the inference looks like right now in my notebook:

batch = tokenizer([source], truncation=True, padding='longest', return_tensors="pt").to('cuda')
predicted = tokenizer.batch_decode(model.generate(**batch, length_penalty = 1.5, repetition_penalty=4.), skip_special_tokens=True, )[0]

Thanks!

Hey @ujjirox,

Nice to hear that you want to use Amazon SageMaker for deploying your Summarization model, and yes it is possible.
The Inference Toolkit supports the functionalities as the transformers pipelines. You can provide all of your additional prediction parameters in the parameters attribute. I attached an example below of how the request body looks like.

{
	"inputs": "Hugging Face, the winner of VentureBeat’s Innovation in Natural Language Process/Understanding Award for 2021, is looking to level the playing field. The team, launched by Clément Delangue and Julien Chaumond in 2016, was recognized for its work in democratizing NLP, the global market value for which is expected to hit $35.1 billion by 2026. This week, Google’s former head of Ethical AI Margaret Mitchell joined the team.",
	"parameters": {
		"repetition_penalty": 4.0,
		"length_penalty": 1.5
	}
}

You can find more information/parameter at Pipelines — transformers 4.10.1 documentation or at our sagemaker documentation Deploy models to Amazon SageMaker

Here is an end-to-end example for the google/pegasus-large how to deploy and use your parameters.
Note: the code snippet is scrollable

from sagemaker.huggingface import HuggingFaceModel
import sagemaker

role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
	'HF_MODEL_ID':'google/pegasus-large',
	'HF_TASK':'summarization'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
	transformers_version='4.6.1',
	pytorch_version='1.7.1',
	py_version='py36',
	env=hub,
	role=role, 
)

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.m5.xlarge' # ec2 instance type
)

long_text= """
Hugging Face, the winner of VentureBeat’s Innovation in Natural Language Process/Understanding Award for 2021, 
is looking to level the playing field. The team, launched by Clément Delangue and Julien Chaumond in 2016,
was recognized for its work in democratizing NLP, the global market value for which is expected to 
hit $35.1 billion by 2026. This week, Google’s former head of Ethical AI Margaret Mitchell joined the team.
"""

parameters = {'repetition_penalty':4.,'length_penalty': 1.5}

predictor.predict({"inputs":long_text,"parameters":parameters})
2 Likes

Awesome! Thanks a lot! This is super helpful. Cheers!

@philschmid

Hey Phil,

Quick question if you don’t mind - are there any arguments in the deployment script to input tags. I am actually getting hit up by my cloud governance team that I need to specify the tags, otherwise it triggers an alert. By tags, I am referring to - Manage tags - AWS Resource Groups and Tags

Thanks!

Hey @ujjirox,

Yes, you can add tags to all related “services” for inference. Meaning in the console:

  • Models
  • Endpoint Configurations
  • Endpoints

To add these tags you need to add a tags key to the .deploy method, which is a list out of dictionaries with Key, Value pairs. See below for an example

# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
	initial_instance_count=1, # number of instances
	instance_type='ml.m5.xlarge', # ec2 instance type
    tags=[{'Key':'STAGE', 'Value':'Production'},
          {'Key':'MAINTAINER','Value':'ujjrox'}]
)

This will add the tags to all 3 resources

@philschmid Thank you very much! I’ll leave you in peace!

Your welcome! Don’t hesitate to ask if you have further questions or if your company would like to have better support! Expert Acceleration Program – Hugging Face

1 Like

Hi.

I’ve been stuck here for a while and I can’t clarify myself.

How do you know what hyperparameters can be used in the pipeline? I mean:

parameters = {
                  'repetition_penalty': 4.0,
                  'length_penalty': 1.5
}

input_pipeline = {
                  "inputs":long_text,
                  "parameters":parameters
}

what other parameters are accepted in “parameters”?

I tried to look for at SummarizationPipeline class, but I don’t find such parameters, so here is where I stuck… I suppose that it will also depend of the exact model you are use, here for example: google/pegasus-large. So my question is:

For a particular model, how to get the exact list of accepted hyparparameters to be used in the pipeline?

Hello,

Thanks for the Question. To find all parameters (named) you need to dig a bit deeper into the code. For example, for the SummarizationPipeline the generate method is used under the hood. There you can find all available kwargs.

I opened a thread internally to improve documentation on this.

1 Like

Okay.

Here you suggested to me to add the Truncation parameter. This is a parameter related with the tokenizer. Why it can be used as “pipeline parameter”. I’m looking for a solution like:

parameters = {
                  'truncation': True,
                  'max_length': 256,
                  'padding': True,
}

But I don’t know if the names are right or they aren’t (except Truncation, since you suggested the name). Seems that “max_length” parameter has not any effect on predictions because such predictions are exactly the same with different values.

Yes,
The Parameter would be correct, but the current released transformers only support padding, and truncation as **kwargs if you take a look here:

But the next transformers version will allow all **tokenizers_kwargs to be passed in.

Ops! This is very bad news :(.

I have a model for classification trained on texts truncated on 256 tokens and I need to deploy it. What would you recommend me to do in such case?.

Greetings!

You don’t need to provide the parameters to run inference. It should work like a charm. Padding is not required for inference.
But if you still want to set the max_length to 256 you can modify the tokenizer_config to 256, example: tokenizer_config.json · distilbert-base-uncased-finetuned-sst-2-english at main

I don’t understand well…

why do you say I don’t need to provide the parameters on inference step? Does a model trained with texts truncated at 256 tokens perform good predictions on texts with more than 256 tokens?

On other hand, where should I put this tokenizer_config…?

Not providing them for Inference will definitely increase the inference speed, since the input won’t be padded and be used as it is.
@Oigres since the underlying original model was properly pre-trained on 512 tokens it should perform decently well Yes.
You can add it into your model.tar.gz but I would do some tests in advance before deploying.

1 Like

Okay.

Finally I retrained the model with 512 tokens and will deploy as it.

Thank you so much.

1 Like