Truncation of input data for Summarization pipeline

I could reproduce the issue and also found the root cause of it. The issue that

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.

And there is currently no way to pass in the max_length to the inference toolkit.

There are now 2 options to solve this you could either for the model into your own repository and add a tokenizer_config.json similar to this one tokenizer_config.json · distilbert-base-uncased-finetuned-sst-2-english at main.

or you could provide a custom inference.py as entry_point when creating the HuggingFaceModel.
e.g.

huggingface_model = HuggingFaceModel(
    env=env,
    role=role,
    transformers_version="4.6",
    pytorch_version="1.7",
    py_version="py36",
    entry_point="inference.py",
)

The inference.py then need to contain predict_fn and a model_fn. Pseudo code below.

def model_fn(model_dir):
    """"model_dir is the location where the model is stored"""
    tokenizer = BartTokenizer.from_pretrained("facebook/bart-large-cnn")
    model = BartForConditionalGeneration.from_pretrained("facebook/bart-large-cnn")
    return model, tokenizer 

def predict_fn(data, model):
    """model is the return of the model_fn and data is the json from the request as python dict"""
   model,tokenzier = model
    outputs = model.generate(
    inputs["input_ids"],
    max_length=300,
    min_length=100,
    length_penalty=2.0,
    num_beams=4,
    early_stopping=True,
    )

   summary = " ".join(
    tokenizer.decode(
        g, skip_special_tokens=True, clean_up_tokenization_spaces=False
    )
    for g in outputs
    )
    return {"summary":summary}

You can find more documentation here: Deploy models to Amazon SageMaker

2 Likes