Custom DistilBertTokenizer training

Hi All
I am trying to find how to build a custom tokenizer for DistilBert all the examples I saw just use the pre-trained tokenizer.


Can someone point me to how to build my custom model?
Thanks in advance. :slight_smile:

Hi, this is probably where you can start if you want to build a fast tokenizer: https://huggingface.co/docs/tokenizers/python/master/quicktour.html

cc @anthony and @Narsil

Thanks for the answer!. I will try the example it is a bit different and I wanted to make sure I have something more similar to the DistilBert.
Thanks!

I still do not see how to build this model as distilBertTokenizer any ideas?