Warm-started encoder-decoder models (Bert2Gpt2 and Bert2Bert)

Ayham · December 12, 2021, 2:10pm

Thank you @nielsr for your clarification, it’s really clear. I read your notebook (Fine-tune TrOCR on the IAM Handwriting Database) and tried to understand what the TrOCRProcessor is and found that “it wraps ViTFeatureExtractor and RobertaTokenizer into a single instance to both extract the input features and decode the predicted token ids”. Then I found that you put “processor.feature_extractor” in the “tokenizer” argument in the Seq2SeqTrainer as follow:

trainer = Seq2SeqTrainer(
model=model,
tokenizer=processor.feature_extractor,
args=training_args,
compute_metrics=compute_metrics,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
data_collator=default_data_collator,
)

That leads me to one more question:
In my case, Bert2GPT2 model for the summarization task, what should I put in the tokenizer argument in the Seq2SeqTrainer instead of “processor.feature_extractor”? the encoder tokenizer (Bert tokenizer) or the decoder tokenizer (GPT2 tokenizer)? Note that in the outdated blog (patrickvonplaten/bert2gpt2-cnn_dailymail-fp16 · Hugging Face) this argument (tokenizer) has been omitted.

Thanks again

Topic		Replies	Views
Leveraging pre-trained checkpoints for summarization Models	33	3206	November 25, 2022
Training Bert2GPT2 model Summarization doesn't lead to acceptable results Models	0	462	December 8, 2021
Training issue of a Transformer based Encoder-Decoder model based on pre-trained BanglaBERT Models	1	756	May 12, 2022
BERT2BERT Notebook for Models without GenerationMixin 🤗Transformers	0	301	November 12, 2020
Bert2bert translator? 🤗Transformers	6	62	August 28, 2025

Warm-started encoder-decoder models (Bert2Gpt2 and Bert2Bert)

Related topics