Training Roberta for RAG

Hey, I’m considering training the RAG (Retrieval-Augmented Generation) model from scratch, and I want to start my preparations for this project by first training the generator used in the RAG architecture. The RAG documentation mentions that the BartForConditionalGeneration model was used for this purpose, but I would like to use the Roberta model instead. My request to you is to confirm whether my line of thinking is correct.

Here’s my plan:

  1. I would start by training the RobertaForMaskedLM model on wikidumps.
  2. Then, I would convert the trained model to RobertaForQuestionAnswering and perform fine-tuning on a dataset similar to SQuAD.

Can I switch the model type from RobertaForMaskedLM to RobertaForQuestionAnswering without any consequences? Is RobertaForQuestionAnswering an appropriate model to use as a generator in the RAG architecture (as it is seqtoseq model I assume that I can but please confirm)?
Thanks in advance.