what i did to train my LLM on our documents, ive used GPT-4 API and wrote python code, to send text from document, and asked GPT to give me 20 questions to each document and the resposne was in json format with INPUT (question + doc text) and OUTPUT as answer. than ive finetuned my model with this data, and its working preety impressive, ive also added vector database where i store new documents and even on documents that are not in LLM is working very well.
1 Like