T5 model tokenizer

T5 models are using BPE tokenizers? Is it possible to use another type of tokenizer along a T5 model, or not because they are designed to work with BPE?

AFAIK T5 is using SentencePiece T5 which has BPE implemented GitHub - google/sentencepiece: Unsupervised text tokenizer for Neural Network-based text generation. and therefore depends on this.

Why would you like to use another tokenizer?

If you’re training from scratch, then you would typically train a tokenizer on your own data, in which case you can choose which tokenizer training algorithm (BPE, WordPiece or UnigramLM if you’re using :hugs: tokenizers) and how to preprocess the data before tokenizing it. I can recommend this chapter of the HF course to learn more about tokenizers: Introduction - Hugging Face Course