Warm-started encoder-decoder models (Bert2Gpt2 and Bert2Bert)

Thank you @nielsr for your clarification, it’s really clear. I read your notebook (Fine-tune TrOCR on the IAM Handwriting Database) and tried to understand what the TrOCRProcessor is and found that “it wraps ViTFeatureExtractor and RobertaTokenizer into a single instance to both extract the input features and decode the predicted token ids”. Then I found that you put “processor.feature_extractor” in the “tokenizer” argument in the Seq2SeqTrainer as follow:

trainer = Seq2SeqTrainer(
model=model,
tokenizer=processor.feature_extractor,
args=training_args,
compute_metrics=compute_metrics,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
data_collator=default_data_collator,
)

That leads me to one more question:
In my case, Bert2GPT2 model for the summarization task, what should I put in the tokenizer argument in the Seq2SeqTrainer instead of “processor.feature_extractor”? the encoder tokenizer (Bert tokenizer) or the decoder tokenizer (GPT2 tokenizer)? Note that in the outdated blog (patrickvonplaten/bert2gpt2-cnn_dailymail-fp16 · Hugging Face) this argument (tokenizer) has been omitted.

Thanks again