Speeding up Tokenization on large text corpus

I have a similar issue, using a pretrained WordPiece tokenizer on a large corpus of text takes several hours. Iā€™m doing:

tokenizer = AutoTokenizer.from_pretrained(ā€œdistilbert-base-uncasedā€)
train_tokenized_encodings = tokenizer(df[df.split==ā€˜trainā€™].text.tolist(), truncation=True, padding=True, max_length=MAX_LENGTH)

Any suggestions for speed up?

Is there a way to parallelize this? (Or does the above automatically use multiple workers?)