Custom DistilBertTokenizer training

Hi All
I am trying to find how to build a custom tokenizer for DistilBert all the examples I saw just use the pre-trained tokenizer.

Can someone point me to how to build my custom model?
Thanks in advance. :slight_smile:

Hi, this is probably where you can start if you want to build a fast tokenizer:

cc @anthony and @Narsil

Thanks for the answer!. I will try the example it is a bit different and I wanted to make sure I have something more similar to the DistilBert.

I still do not see how to build this model as distilBertTokenizer any ideas?