I could reproduce the issue and also found the root cause of it. The issue that
Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.
And there is currently no way to pass in the max_length
to the inference toolkit.
There are now 2 options to solve this you could either for the model into your own repository and add a tokenizer_config.json
similar to this one tokenizer_config.json · distilbert-base-uncased-finetuned-sst-2-english at main.
or you could provide a custom inference.py
as entry_point
when creating the HuggingFaceModel
.
e.g.
huggingface_model = HuggingFaceModel(
env=env,
role=role,
transformers_version="4.6",
pytorch_version="1.7",
py_version="py36",
entry_point="inference.py",
)
The inference.py
then need to contain predict_fn
and a model_fn
. Pseudo code below.
def model_fn(model_dir):
""""model_dir is the location where the model is stored"""
tokenizer = BartTokenizer.from_pretrained("facebook/bart-large-cnn")
model = BartForConditionalGeneration.from_pretrained("facebook/bart-large-cnn")
return model, tokenizer
def predict_fn(data, model):
"""model is the return of the model_fn and data is the json from the request as python dict"""
model,tokenzier = model
outputs = model.generate(
inputs["input_ids"],
max_length=300,
min_length=100,
length_penalty=2.0,
num_beams=4,
early_stopping=True,
)
summary = " ".join(
tokenizer.decode(
g, skip_special_tokens=True, clean_up_tokenization_spaces=False
)
for g in outputs
)
return {"summary":summary}
You can find more documentation here: Deploy models to Amazon SageMaker