Seeking Advice on Fine-Tuning LLMs for Generating Documents

koyaba · February 15, 2025, 2:40pm

Hello everyone,

I hope this message finds you well. I am currently working on a project that involves generating technical documents, specifically CCTP (Cahier des Clauses Techniques Particulières), from DQE ( a document which contains the summary for the drafting of the cctp) using large language models (LLMs). I have access to several examples of both CCTP and DQE, as well as a powerful GPU setup with an A100 40GB.

I am looking for advice on the following aspects:

Model Selection: Which open-source LLMs would be best suited for fine-tuning to generate detailed technical documents? I am considering models like BLOOM, T5, BART, Llama, and Mistral.
Data Preparation: How should I structure and prepare my training data to effectively fine-tune these models? I have extracted text from PDFs and need guidance on annotation and creating input-output pairs.
Fine-Tuning Process: Any tips or best practices for fine-tuning these models on my specific task? I am particularly interested in ensuring the generated documents are accurate and coherent.

I would greatly appreciate any insights, resources, or experiences shared by the community. Thank you in advance for your help!

Best regards,

John6666 · February 15, 2025, 4:25pm

If it’s an error or something, we can deal with it to a certain extent on this forum. However, I think it’s more reliable to ask about specialized topics such as LLM tuning or training know-how for generative AI on HF Discord.

Topic		Replies	Views
Guidance on getting started with fine tuned uncensored model Beginners	2	1152	March 8, 2025
Text generation, LLMs and fine-tuning Beginners	0	1696	December 8, 2022
Primer on Fine Tuning Text generation models (like GPT) Intermediate	0	1388	November 14, 2022
How to fine-tune an LLM model with an entire document in a format such as *.txt/docx/pdf ect 🤗AutoTrain	6	7213	August 21, 2024
Fine-tuning conversational models with the technical documentation Beginners	2	1300	July 18, 2024

Seeking Advice on Fine-Tuning LLMs for Generating Documents

Related topics