How to build a multilingual tokenizer from scratch (for mbart))

I have read the doc on building a tokenizer from scratch but i cannot find the information about multilingual tokenizer. Does anybody have any suggestions on this ?