Fine-tuning LLM for RAG

I am currently fine-tuning a LLM with a custom QA dataset, and was wondering whether there would be a significant difference whether I ran QLoRA fine-tuning with a model initialized with AutoModelForCausalLM and a model initialized with AutoModelForQuestionAnswering. If there does exist a significant difference, which of the two is preferrable?

For additional context, the dataset I will be fine-tuning with consists of three columns: question, context, and answer. The three columns are then formatted with a prompt along the lines of

### Instruction
Use the context below to generate an answer to the provided question. If the context does not contain sufficient information, state that an answer could not be found.

### Context

### Question

### Response


The difference is generative vs. extractive question answering.

The AutoModelForQuestionAnswering class is meant for extractive question answering, i.e. the model is a classifier and needs to determine which text token of the context is at the start of the answer and which part of the context is at the end.

The AutoModelForCausalLM classs is meant for generative question answering, i.e. the model is a generative model (like ChatGPT) and can generate any text given the context + question. This means that the model can give answers that go beyond the literate text of the context.