Why is padding and truncation are optional?

martinmin · January 8, 2023, 7:28am

In this page:

It says that the tokenizer’s API can have no padding or truncation as below:

* `False` or `'do_not_pad'`: no padding is applied. This is the default behavior.
* `False` or `'do_not_truncate'`: no truncation is applied. This is the default behavior.

Aren’t padding and truncation always necessary to ensure the same length for all sequences? Don’t understand why the default parameter is ‘do_not_pad’ and ‘do_not_truncate’.

Topic		Replies	Views
Migration guide from v2.X to v3.X for the tokenizer API 🤗Transformers	0	757	July 7, 2020
Purpose of padding and truncating Beginners	7	3333	August 3, 2020
Padding should be True, please explain Beginners	1	12	August 18, 2024
How padding in huggingface tokenizer works? 🤗Tokenizers	4	6750	November 22, 2021
Can I add custom truncation options to use special token? Beginners	0	295	October 29, 2021

Why is padding and truncation are optional?

Related topics