I have a RoBERTa model working great in Python and I want to move it to my service - which is written in Java.
For that I need to imitate the RobertaTokenizer Python class - since I didn’t find a Java implementation for it. From what I understand, and I’m pretty new to Transformers, the RobertaTokenizer is similar to SentencePiece but not exactly like it.
I have as reference a Java Tokenizer implementation for CamemBERT which uses SentencePiece, and hugging face documentation says that the CamemBERT tokenizer inherits from the RoBERTa tokenizer.
My question here is, what would be the best way to implement a RoBERTa tokenizer in Java? Can I use the SentencePiece class like used in CamemBERT?