Customized tokenizers

Hi everyone.

Is it possible to use a customized algorithm for tokenizing and building the vocab during Language Modeling?

Thanks!