Truncation of input data for Summarization pipeline

I’m using bert-large-cnn for a summarization task. I have been truncating the input text in order to avoid exceeding the maximum sequence length:

tokenizer = BartTokenizer.from_pretrained("facebook/bart-large-cnn")
model = BartForConditionalGeneration.from_pretrained(
    "facebook/bart-large-cnn"
)

inputs = tokenizer(
    input_text, return_tensors="pt", max_length=1024, truncation=True
)

outputs = model.generate(
    inputs["input_ids"],
    max_length=300,
    min_length=100,
    length_penalty=2.0,
    num_beams=4,
    early_stopping=True,
)

summary = " ".join(
    tokenizer.decode(
        g, skip_special_tokens=True, clean_up_tokenization_spaces=False
    )
    for g in outputs
)

How can I replicate this in SageMaker? I don’t see a way to pass configuration values to the tokenizer when calling predict on an instance of HuggingFaceModel? Here is my code so far:

env = {
    "HF_MODEL_ID": "facebook/bart-large-cnn",
    "HF_TASK": "summarization",
}

huggingface_model = HuggingFaceModel(
    env=env,
    role=role,
    transformers_version="4.6",
    pytorch_version="1.7",
    py_version="py36",
)

data = {
    "inputs": input_text,
    "parameters": {
        "max_length": 300,
        "min_length": 100,
        "length_penalty": 2.0,
        "num_beams": 4,
    }
}

result = predictor.predict(data)

Thank you!

1 Like

Hello @alistair,

You can also provide tokenizers kwargs as parameters.
For example

data = {
    "inputs": input_text,
    "parameters": {
        "max_length": 300,
        "min_length": 100,
        "length_penalty": 2.0,
        "num_beams": 4,
        "truncation": True,
    }
}

Thanks for your suggestion @philschmid. This isn’t working for some reason. I am continuing to get the “index out of range in self” error thrown when the maximum sequence length is exceeded.

1 Like

I could reproduce the issue and also found the root cause of it. The issue that

Asking to truncate to max_length but no maximum length is provided and the model has no predefined maximum length. Default to no truncation.

And there is currently no way to pass in the max_length to the inference toolkit.

There are now 2 options to solve this you could either for the model into your own repository and add a tokenizer_config.json similar to this one tokenizer_config.json · distilbert-base-uncased-finetuned-sst-2-english at main.

or you could provide a custom inference.py as entry_point when creating the HuggingFaceModel.
e.g.

huggingface_model = HuggingFaceModel(
    env=env,
    role=role,
    transformers_version="4.6",
    pytorch_version="1.7",
    py_version="py36",
    entry_point="inference.py",
)

The inference.py then need to contain predict_fn and a model_fn. Pseudo code below.

def model_fn(model_dir):
    """"model_dir is the location where the model is stored"""
    tokenizer = BartTokenizer.from_pretrained("facebook/bart-large-cnn")
    model = BartForConditionalGeneration.from_pretrained("facebook/bart-large-cnn")
    return model, tokenizer 

def predict_fn(data, model):
    """model is the return of the model_fn and data is the json from the request as python dict"""
   model,tokenzier = model
    outputs = model.generate(
    inputs["input_ids"],
    max_length=300,
    min_length=100,
    length_penalty=2.0,
    num_beams=4,
    early_stopping=True,
    )

   summary = " ".join(
    tokenizer.decode(
        g, skip_special_tokens=True, clean_up_tokenization_spaces=False
    )
    for g in outputs
    )
    return {"summary":summary}

You can find more documentation here: Deploy models to Amazon SageMaker

2 Likes

Thank you @philschmid for explaining the cause of the problem and for providing two good solutions.