Train Tokenizer from scratch on Indic Lanuguages

How to train a own tokenizer model from scratch on indic languages such as tamil , telugu , kannada and what are the steps to do it ?