Tokenizer progress bar

Hi Great Community,

Is it possible to have a progress bar to track the tokenisation process when calling the following method?

tokenizer(large_batch, padding=True, truncation=True,max_length=512)

5 Likes

I would also like a progress bar for tokenizing! Maybe a verbose setting? Did you ever hear back about this?

tqdm library allows you to output a progress bar for the tokenizer.

from tqdm import tqdm
def tokenizer_with_progress(large_batch):
    tokenized_texts = []
    for text in tqdm(large_batch, desc="Tokenizing", unit="text"):
        tokenized_texts.append(tokenizer(text, padding=True, truncation=True, max_length=512))
    return tokenized_texts

train_encodings = tokenizer_with_progress(training_sentences)
test_encodings = tokenizer_with_progress(testing_sentences)