Tokenizer for Translation Pipeline with Bert2Bert EncoderDecoder

danringwald · February 23, 2022, 10:23pm

Hello,

I would like to build a translation pipeline with a Bert2Bert EncoderDecoder model.

Calling an input tokenizer, then the custom Bert2Bert model, and finally the output tokenizer goes fine. However building a pipeline is very hard because of the tokenizer: the pipeline function accepts only 1 tokenizer:
generator = pipeline(task=“text-generation”, model=model, tokenizer=tokenizer)

How can I build a “bilingual” BertTokenizerFast which encodes french inputs in regular context, and decodes english targets under the as_target_tokenizer() context ? (just like the MarianMT tokenizer)

Any help would be esteemed deeply

Topic		Replies	Views
EnocederDecoder training/prediction with two tokenizers Beginners	1	779	October 22, 2024
Bert2Bert Translation task Models	0	1089	August 24, 2022
BertTokenizerFast for stsb-xlm-r-multilingual model 🤗Tokenizers	3	662	April 8, 2021
Machine Translation using Hugging Face problem Intermediate	0	323	May 8, 2023
Control EncoderDecoderModel to generate tokens step by step 🤗Transformers	8	2594	June 8, 2022

Tokenizer for Translation Pipeline with Bert2Bert EncoderDecoder

Related topics