If I want to adapt a foundational base model(i.e. Llama or GPT-J-6B etc.) for a custom domain and have many unstructured documents, how to feed these documents into a foundational base model? I guess I need to do this step before instruction fine tuning and RLHF so that the model has the knowledge regarding the custom domain first. Is this correct?
2 Likes