BertTokenizerFast for stsb-xlm-r-multilingual model

Hi community,

Would there be a fast tokenizer for the stsb-xlm-r-multilingual model?

Thanks !

Hi community and @lewtun,

Could anyone have an idea on how to get a fast tokenizer for stsb-xlm-r-multilingual model?

I am blocked with low latency response due to tokenizer computation. Is there a fast tokenizer model as BertTokenizerFast or is there a way to run tokenizer on GPU ?

hey @Matthieu, as far as i know the “fast” refers to the rust implementations of the tokenizers: tokenizers/tokenizers at master · huggingface/tokenizers · GitHub

there are bindings for python, so perhaps you can adapt the suggestion here to your use case? e.g. download the tokenizer.json file for stsb and load the fast version as follows:

from transformers import RobertaTokenizerFast
tokenizer = RobertaTokenizerFast(tokenizer_file="tokenizer.json")

(i’m not super familiar with the stsb-xlm-r-multilingual model but am assuming it’s using the same tokenization strategy as XLM-R)

Hi @lewtun thanks. I finally found that there is a xlmrobertatokenizerfast implementation.

1 Like