I’m using bert-large-cnn
for a summarization task. I have been truncating the input text in order to avoid exceeding the maximum sequence length:
tokenizer = BartTokenizer.from_pretrained("facebook/bart-large-cnn")
model = BartForConditionalGeneration.from_pretrained(
"facebook/bart-large-cnn"
)
inputs = tokenizer(
input_text, return_tensors="pt", max_length=1024, truncation=True
)
outputs = model.generate(
inputs["input_ids"],
max_length=300,
min_length=100,
length_penalty=2.0,
num_beams=4,
early_stopping=True,
)
summary = " ".join(
tokenizer.decode(
g, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
for g in outputs
)
How can I replicate this in SageMaker? I don’t see a way to pass configuration values to the tokenizer when calling predict
on an instance of HuggingFaceModel
? Here is my code so far:
env = {
"HF_MODEL_ID": "facebook/bart-large-cnn",
"HF_TASK": "summarization",
}
huggingface_model = HuggingFaceModel(
env=env,
role=role,
transformers_version="4.6",
pytorch_version="1.7",
py_version="py36",
)
data = {
"inputs": input_text,
"parameters": {
"max_length": 300,
"min_length": 100,
"length_penalty": 2.0,
"num_beams": 4,
}
}
result = predictor.predict(data)
Thank you!
1 Like
Hello @alistair,
You can also provide tokenizers
kwargs
as parameters
.
For example
data = {
"inputs": input_text,
"parameters": {
"max_length": 300,
"min_length": 100,
"length_penalty": 2.0,
"num_beams": 4,
"truncation": True,
}
}
Thanks for your suggestion @philschmid. This isn’t working for some reason. I am continuing to get the “index out of range in self” error thrown when the maximum sequence length is exceeded.
1 Like
I could reproduce the issue and also found the root cause of it. The issue that
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
And there is currently no way to pass in the max_length
to the inference toolkit.
There are now 2 options to solve this you could either for the model into your own repository and add a tokenizer_config.json
similar to this one tokenizer_config.json · distilbert-base-uncased-finetuned-sst-2-english at main.
or you could provide a custom inference.py
as entry_point
when creating the HuggingFaceModel
.
e.g.
huggingface_model = HuggingFaceModel(
env=env,
role=role,
transformers_version="4.6",
pytorch_version="1.7",
py_version="py36",
entry_point="inference.py",
)
The inference.py
then need to contain predict_fn
and a model_fn
. Pseudo code below.
def model_fn(model_dir):
""""model_dir is the location where the model is stored"""
tokenizer = BartTokenizer.from_pretrained("facebook/bart-large-cnn")
model = BartForConditionalGeneration.from_pretrained("facebook/bart-large-cnn")
return model, tokenizer
def predict_fn(data, model):
"""model is the return of the model_fn and data is the json from the request as python dict"""
model,tokenzier = model
outputs = model.generate(
inputs["input_ids"],
max_length=300,
min_length=100,
length_penalty=2.0,
num_beams=4,
early_stopping=True,
)
summary = " ".join(
tokenizer.decode(
g, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
for g in outputs
)
return {"summary":summary}
You can find more documentation here: Deploy models to Amazon SageMaker
2 Likes
Thank you @philschmid for explaining the cause of the problem and for providing two good solutions.