Average time to train a SentencePieceBPETokenizer

Hello,

I am training my own SentencePieceBPETokenizer tokenizer on java CodeSearchNet dataset. I know sp_tokenizer.train_from_iterator does not show the bar progress in Colab. However, I would like to know the progress my tokenizer is doing in the training. So, I have two questions:

  1. Do you a work around to print the progress bar in Colab?
  2. How long time a tokenizer finish to train aprox.?

Thank you

1 Like