All tokenizers
offer this functionality, just pass the list of seqs to it
tokens = tokenizer([s1, s2])["input_ids"]
by default it’ll pad all the seqs to the maximum length in the batch if they are of different length. You can find more detailed info in this guide