Fine-tuning conversational models with the technical documentation

Hello.

For context, I am currently working in IT maintenance on a large accounting application for a bank.
Using different LLMs and discovering fine-tuning, I wondered if it was possible to fine-tune a conversational model with data from the technical documentation of the application?

If I am not mistaken, this would speed up the work of the maintenance team a lot if we can just ask questions about the application to a model.

Will fine-tuning allow the model to retain the “knowledge” of the application?
The application contains many obscure variable names. Will the Llama tokenizer be suitable?
Would we get better results by simply passing the information in the context of the LLM or via files as GPT4 now allows ?

This is a common use-case for LLMs :slightly_smiling_face:. You will want to research “Retrieval-Augmented Generation” (RAG). It’s the process of enabling an LLM to reference a knowledge base outside of its training data sources before generating a response.

but how one can used it’s dataset to fine tune it using rag ? i’m new in this tech if there is any doc or video you know just provide the link of it it will be very beneficial for me