Convert slow XLMRobertaTokenizer to fast one

Posting a question I had which @SaulLu answered :smiley: suppose you have a repo on the hub that only has slow tokenizer files, and you want to be able to load a fast tokenizer. Here’s how to do that:

!pip install -q transformers sentencepiece

model_name = "naver-clova-ix/donut-base-finetuned-docvqa"

from transformers import XLMRobertaTokenizerFast

tokenizer = XLMRobertaTokenizerFast.from_pretrained(model_name, from_slow=True)

tokenizer.save_pretrained("fast_tok", legacy_format=False)
1 Like