How to fine-tune an LLM model with an entire document in a format such as *.txt/docx/pdf ect

sierzpak · November 29, 2023, 11:39pm

Hi,
I want to be able to fine-tune a language model with thousands of specialized legal documents that I have authored. From what I have tested on Hugging Face, it is possible to fine-tune LLMs based on a Question-Answer format. Do you plan to implement in the future the kind of fine-tuning that I am interested in? Are there any technical obstacles that make such a process impossible to execute? How can I do this?
Best regards
Damian

panigrah · December 1, 2023, 1:16am

what will you use your model for? You don’t have to train just for QA. If your goal is retrieval - then you can consider doing this Getting Started With Embeddings

sierzpak · December 1, 2023, 7:01am

I want the LLM to prepare preliminary responses to incoming legal inquiries, based on legal documents I have previously created. I assume that about 80% of my work is repetitive and I would like to make it a bit easier.
Damian

panigrah · December 1, 2023, 7:42am

Consider embeddings as in the link I had shared, you can get started without any training. So it will be a retrieval problem based on content you already have rather than text generation.

sierzpak · December 1, 2023, 8:41pm

Thank you Panigrah!!! I will experiment with embedding techniques over these few weeks and let you know how I rate it.
Reg,
Damian

Mollel · December 3, 2023, 7:49am

You should consider pre-training the LLM (further pre-train LLMs such as Mistra7B) then fine tune with a few example

SHASWATSINGH3101 · August 21, 2024, 11:17am

can you pls tell me more about it , i have the same problem . the legal data has a specific format , and i want to generate legal documents

Topic		Replies	Views
Need Suggestion Research	2	215	April 19, 2024
LLM fine-tune with domain specific pdf documents Models	20	24966	November 5, 2024
Seeking Advice on Fine-Tuning LLMs for Generating Documents Beginners	1	121	February 15, 2025
Seeking Advice on Fine-Tuning a Legal Language Model for Nepalese Law (LLM + RAG) 🤗 Course Projects	0	169	February 25, 2025
Best practice for finetune LLM Intermediate	0	640	June 21, 2023

How to fine-tune an LLM model with an entire document in a format such as *.txt/docx/pdf ect

Related topics