Variable length batch decoding

All tokenizers offer this functionality, just pass the list of seqs to it

tokens = tokenizer([s1, s2])["input_ids"]

by default it’ll pad all the seqs to the maximum length in the batch if they are of different length. You can find more detailed info in this guide