Hi! I would like to train a sentencePiece tokenizer from scratch but I’m a bit lost from the documentation and don’t know where to start. There are already examples on how to train a BPE tokenizer on the huggingface website but I don’t know if I can simply transfer it 1 to 1. Also I don’t even know where to find the trainable class for sentencePiece.
Have you already trained a sentencePiece tokenizer?

