i have two datasets:
- ~150 internal help documents (knowledge base)
- ~3k search query → answer pairs
I want to build a QA system.
I’ve tuned a pythia* base model on the query-answer pairs (using the Dolly v2 code) and it works pretty well for an instruction model.
My question is how should I include the KB articles (1) ?? They have lots of good internal data/concepts that the model should know about. Should first, continue the pretraining on the pythia base on the KB docs??? then instruction tune?