Fast tokenizer for marianMTModel


I use Helsinki-NLP/opus-mt-fr-en model for translation from french to english.

When I load the tokenizer, I see that the tokenizer isn’t fast even if I use the use_fast=True flag:
tokenizer = AutoTokenizer.from_pretrained(Helsinki-NLP/opus-mt-fr-en, use_fast=True)

PreTrainedTokenizer(name_or_path='Helsinki-NLP/opus-mt-fr-en', vocab_size=59514, model_max_len=512, is_fast=False, padding_side='right', truncation_side='right', special_tokens={'eos_token': '</s>', 'unk_token': '<unk>', 'pad_token': '<pad>'})

Doesn’t it exist fast tokenizer for MarianMTModel?

No, there is no fast tokenizer for Marian models.

1 Like