Hi community,
Would there be a fast tokenizer for the stsb-xlm-r-multilingual
model?
Thanks !
Hi community,
Would there be a fast tokenizer for the stsb-xlm-r-multilingual
model?
Thanks !
Hi community and @lewtun,
Could anyone have an idea on how to get a fast tokenizer for stsb-xlm-r-multilingual
model?
I am blocked with low latency response due to tokenizer computation. Is there a fast tokenizer model as BertTokenizerFast
or is there a way to run tokenizer on GPU ?
hey @Matthieu, as far as i know the “fast” refers to the rust implementations of the tokenizers: tokenizers/tokenizers at master · huggingface/tokenizers · GitHub
there are bindings for python, so perhaps you can adapt the suggestion here to your use case? e.g. download the tokenizer.json
file for stsb and load the fast version as follows:
from transformers import RobertaTokenizerFast
tokenizer = RobertaTokenizerFast(tokenizer_file="tokenizer.json")
(i’m not super familiar with the stsb-xlm-r-multilingual
model but am assuming it’s using the same tokenization strategy as XLM-R)
Hi @lewtun thanks. I finally found that there is a xlmrobertatokenizerfast
implementation.