I want to be able to fine-tune a language model with thousands of specialized legal documents that I have authored. From what I have tested on Hugging Face, it is possible to fine-tune LLMs based on a Question-Answer format. Do you plan to implement in the future the kind of fine-tuning that I am interested in? Are there any technical obstacles that make such a process impossible to execute? How can I do this?
what will you use your model for? You don’t have to train just for QA. If your goal is retrieval - then you can consider doing this Getting Started With Embeddings
I want the LLM to prepare preliminary responses to incoming legal inquiries, based on legal documents I have previously created. I assume that about 80% of my work is repetitive and I would like to make it a bit easier.
Consider embeddings as in the link I had shared, you can get started without any training. So it will be a retrieval problem based on content you already have rather than text generation.
Thank you Panigrah!!! I will experiment with embedding techniques over these few weeks and let you know how I rate it.
You should consider pre-training the LLM (further pre-train LLMs such as Mistra7B) then fine tune with a few example