I am interested in deploying a HuggingFace Model on AWS SageMaker. Let’s say for example I deploy “google/pegasus-large” on AWS. You have very generously given the code to deploy this shown below. I was wondering if, as part of the predict function we have additional arguments. I would like to incorporate a custom length penalty as well as repetition penalty. Would you be able to share where in this code these arguments would be inserted?
from sagemaker.huggingface import HuggingFaceModel
import sagemaker
role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
'HF_MODEL_ID':'google/pegasus-large',
'HF_TASK':'summarization'
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
transformers_version='4.6.1',
pytorch_version='1.7.1',
py_version='py36',
env=hub,
role=role,
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type='ml.m5.xlarge' # ec2 instance type
)
predictor.predict({
'inputs': "The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct."
})
This is what the inference looks like right now in my notebook:
Nice to hear that you want to use Amazon SageMaker for deploying your Summarization model, and yes it is possible.
The Inference Toolkit supports the functionalities as the transformers pipelines. You can provide all of your additional prediction parameters in the parameters attribute. I attached an example below of how the request body looks like.
{
"inputs": "Hugging Face, the winner of VentureBeat’s Innovation in Natural Language Process/Understanding Award for 2021, is looking to level the playing field. The team, launched by Clément Delangue and Julien Chaumond in 2016, was recognized for its work in democratizing NLP, the global market value for which is expected to hit $35.1 billion by 2026. This week, Google’s former head of Ethical AI Margaret Mitchell joined the team.",
"parameters": {
"repetition_penalty": 4.0,
"length_penalty": 1.5
}
}
Here is an end-to-end example for the google/pegasus-large how to deploy and use your parameters. Note: the code snippet is scrollable
from sagemaker.huggingface import HuggingFaceModel
import sagemaker
role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
'HF_MODEL_ID':'google/pegasus-large',
'HF_TASK':'summarization'
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
transformers_version='4.6.1',
pytorch_version='1.7.1',
py_version='py36',
env=hub,
role=role,
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type='ml.m5.xlarge' # ec2 instance type
)
long_text= """
Hugging Face, the winner of VentureBeat’s Innovation in Natural Language Process/Understanding Award for 2021,
is looking to level the playing field. The team, launched by Clément Delangue and Julien Chaumond in 2016,
was recognized for its work in democratizing NLP, the global market value for which is expected to
hit $35.1 billion by 2026. This week, Google’s former head of Ethical AI Margaret Mitchell joined the team.
"""
parameters = {'repetition_penalty':4.,'length_penalty': 1.5}
predictor.predict({"inputs":long_text,"parameters":parameters})
Quick question if you don’t mind - are there any arguments in the deployment script to input tags. I am actually getting hit up by my cloud governance team that I need to specify the tags, otherwise it triggers an alert. By tags, I am referring to - Manage tags - AWS Resource Groups and Tags
Yes, you can add tags to all related “services” for inference. Meaning in the console:
Models
Endpoint Configurations
Endpoints
To add these tags you need to add a tags key to the .deploy method, which is a list out of dictionaries with Key, Value pairs. See below for an example
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type='ml.m5.xlarge', # ec2 instance type
tags=[{'Key':'STAGE', 'Value':'Production'},
{'Key':'MAINTAINER','Value':'ujjrox'}]
)
what other parameters are accepted in “parameters”?
I tried to look for at SummarizationPipeline class, but I don’t find such parameters, so here is where I stuck… I suppose that it will also depend of the exact model you are use, here for example: google/pegasus-large. So my question is:
For a particular model, how to get the exact list of accepted hyparparameters to be used in the pipeline?
Thanks for the Question. To find all parameters (named) you need to dig a bit deeper into the code. For example, for the SummarizationPipeline the generate method is used under the hood. There you can find all available kwargs.
I opened a thread internally to improve documentation on this.
Here you suggested to me to add the Truncation parameter. This is a parameter related with the tokenizer. Why it can be used as “pipeline parameter”. I’m looking for a solution like:
But I don’t know if the names are right or they aren’t (except Truncation, since you suggested the name). Seems that “max_length” parameter has not any effect on predictions because such predictions are exactly the same with different values.
why do you say I don’t need to provide the parameters on inference step? Does a model trained with texts truncated at 256 tokens perform good predictions on texts with more than 256 tokens?
Not providing them for Inference will definitely increase the inference speed, since the input won’t be padded and be used as it is. @Oigres since the underlying original model was properly pre-trained on 512 tokens it should perform decently well Yes.
You can add it into your model.tar.gz but I would do some tests in advance before deploying.
I’m getting the following error with long input -
ModelError: An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (400) from model with message "{
“code”: 400,
“type”: “InternalServerException”,
“message”: “The size of tensor a (577) must match the size of tensor b (512) at non-singleton dimension 1”
}
We have retrained and fine-tuned a Albert xx large classification model. We used the huggingface sagemaker to deploy the model. It works. However we got the above error with a long text.
Here is the deployment code.
task_env = {
‘TASK’:‘text-classification’,
‘HF_TASK’:‘text-classification’ # NLP task you want to use for predictions
}
create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
env=task_env,
model_data=saved_model,
enable_network_isolation=True,
transformers_version=“4.6”, # transformers version used
pytorch_version=“1.7”, # pytorch version used
py_version=“py36”, # python version of the DLC
)
from sagemaker.huggingface import HuggingFaceModel
import sagemaker
role = sagemaker.get_execution_role()
# Hub Model configuration. https://huggingface.co/models
hub = {
'HF_MODEL_ID':'textattack/albert-base-v2-imdb',
'HF_TASK':'text-classification'
}
# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
transformers_version='4.6.1',
pytorch_version='1.7.1',
py_version='py36',
env=hub,
role=role,
)
# deploy model to SageMaker Inference
predictor = huggingface_model.deploy(
initial_instance_count=1, # number of instances
instance_type='ml.m5.xlarge' # ec2 instance type
)