RAG: Do we need to pretrained the doc-encoder when using a custom dataset?

shamanez · October 26, 2020, 8:42pm

Now the Huggiface RAG consists of a script where we can use a custom dataset other than the wiki-dataset.

Since, in the fine-tuning phase of the RAG, we do not update the doc-encoder (we update only BART and Question Encoder), what if our custom dataset consists of different distribution compared to the wiki dataset (Ex: medical records)?

Will it still work?

P.S - In the RAG paper authors just used the pretrained DPR and they never updated the doc encoder weights in the fine-tuning mechanism.

Topic		Replies	Views
RAG custom dataset Models	2	2931	July 23, 2024
How do we insert our own datasets in DPR / RAG retrieval Q&A models? 🤗Transformers	1	1638	October 11, 2020
Continued (in-domain) Pre-training of BART 🤗Transformers	1	462	September 13, 2023
Finetune_rag.py won't save checkpoints 🤗Transformers	0	115	May 9, 2024
RAG (DPR+seq2seq) pre-trained example Models	0	399	December 12, 2022

RAG: Do we need to pretrained the doc-encoder when using a custom dataset?

Related topics