Iāve just published Scala bindings for the Rust Tokenizers library to be able to use them on the JVM. Currently, it can only load and run pre-trained tokenizers. Training is not yet possible.
While it is currently focused on Scala it should be straightforward to add pure Java support as well if there is interest in that.
Hereās the project: GitHub - sbrunk/tokenizers-scala: Scala bindings for Hugging Face Tokenizers