Purpose of padding and truncating

Hi @aclifton314,
padding :
Padding is used to make all examples same length so that you can pack them in batch, sequences with uneven length can’t be batched. So if a sequence is shorter, than your max length then padding is used to make that sequence longer. Also some model might expect fixed length input, so padding help there too.

truncation:
Most of the models have max_lengths defined for them (there are exceptions, model with relative attention can take arbitrarily long sequences) for ex.for BERT max_length is 512, so if one of your sequence is longer than that you can’t feed it directly, so you need to truncate (drop extra tokens) to make the sequence smaller.

Hope this helps :slight_smile:

6 Likes