Why does padding = 'max_length' cause much slower model inference?

jackkwok · January 28, 2022, 12:05am

I trained a bert-based-uncase AutoModelForSequenceClassification model and found that model inference is at least 2x faster if I comment out padding = ‘max_length’ in the encode step. My understanding is BERT expects a fix length of 512 tokens… doesn’t that imply input must be padded to 512? Please help me understand.

sequence = tokenizer.encode_plus(question,
                                        passage,
                                        max_length = 256,
                                        padding = 'max_length',
                                        truncation = \"longest_first\",
                                        return_tensors=\"pt\")['input_ids'].to(device)

djstein · June 8, 2023, 6:33pm

In the same boat. We switched added max_length=512 and our training time went from 6 minutes to 1 hour on a 4090

Topic		Replies	Views
Need clarity on "padding" parameter in Bert Tokenizer 🤗Tokenizers	0	485	December 8, 2022
Purpose of padding and truncating Beginners	7	3327	August 3, 2020
Question about Bert padding part when calcualting similarity matrix Beginners	2	688	May 13, 2022
SQuAD/BERT: Why max_length=384 by default and not 512? Models	1	2465	November 15, 2021
Bert strugling with Padded sentence 🤗Transformers	0	386	August 24, 2021

Why does padding = 'max_length' cause much slower model inference?

Related topics