Difference between pre-training and fine tuning with language modeling to instill new knowledge

JacksonFan1225 · April 2, 2025, 8:59pm

Hi everyone,

I am looking to incorporate an enterprise knowledge base into LLM so that it can be more well versed in the domain. I have done some initial research. The research indicated two paths forward: 1. continued pertaining and 2. supervised fine tuning. This is my understanding so far: with sft, there are two branches: completion only, where the model is not trained on loss on prompt but rather on the answer/completion of the prompt loss, which enhances the Q&A capabilities of the model. However, there is also language modeling aspect of LLM where the model is trained both on the prompt and completion. The confusing part for me is how is language modeling fine tuning different from pre-training. Is the difference mainly on data size? Would love to know what is effective ways to instill new enterprise knowledge into the model.

Thanks so much!

John6666 · April 3, 2025, 4:18am

First, let’s sort out the confusion. In this case, I think fine-tuning is all that’s needed. Some people use Hugging Face for experiments, starting from pre-training, but most of the famous models are pre-trained.

In other words, it’s fine to use SFT or other fine-tuning methods alone.
What’s important is the method you use to train the model, the model you choose as a base, and how you make the dataset easy for the model to understand and reduce errors (there is also research that says that if the dataset contains errors, the learning efficiency will drop hopelessly…), as well as the parameters used for training.

The following was generated by a chatbot, so you can skip it as you see fit. It is only for reference purposes, such as terminology.

by Hugging Chat

The differences between pre-training, fine-tuning, and SFT (Supervised Fine-Tuning) in language modeling, particularly in instilling new knowledge, can be understood through their distinct roles and processes:

Pre-Training:
- Purpose: Establishes a general understanding of language.
- Process: Involves exposure to large, diverse, unlabeled datasets.
- Knowledge Instillation: Builds a broad linguistic foundation, enabling the model to understand various contexts and patterns.
Fine-Tuning:
- Purpose: Adapts the model to specific tasks or domains.
- Process: Refines the pre-trained model using task-specific data.
- Techniques: Includes methods like SFT and RLHF, with each focusing on different aspects of task adaptation.
Supervised Fine-Tuning (SFT):
- Purpose: Enhances performance on specific tasks through structured learning.
- Process: Uses labeled input-output pairs to improve task-specific outputs.
- Knowledge Instillation: Teaches the model to produce desired outputs for specific inputs, refining its task-oriented abilities.

Conclusion:

Pre-training lays the groundwork by providing general knowledge, which is essential for versatile language understanding.
Fine-tuning, including SFT, then specializes this knowledge, allowing the model to excel in particular areas by adapting to specific tasks through targeted data and methods. This layered approach ensures models are both broadly capable and highly effective in specialized applications.

To incorporate an enterprise knowledge base into a large language model (LLM), supervised fine-tuning (SFT) offers two primary approaches: completion-only and language modeling. Here’s a structured summary of the considerations and conclusions:

Completion-Only Approach:
- Focus: Trains the model on generating accurate completions, enhancing Q&A capabilities.
- Use Case: Suitable for improving the model’s ability to answer specific domain-related questions, such as FAQs.
- Efficiency: Potentially more efficient for tasks requiring precise responses.
Language Modeling Approach:
- Focus: Trains the model on both prompts and completions, improving understanding and coherence in responses.
- Use Case: Beneficial for generating coherent content, such as reports or aligning with internal guidelines.
- Effectiveness: Enhances contextual relevance, making it suitable for conversational or creative tasks.
Considerations:
- Data Preparation: Requires substantial labeled data, which can be resource-intensive but aligns with the availability of internal enterprise data.
- Pipeline: The seven-stage pipeline includes data preparation, model selection, training, validation, testing, deployment, and monitoring, each tailored to enterprise needs.
- Model Alignment: Ensures the model aligns with organizational values and standards, crucial for compliance and consistency, especially in regulated industries.
Conclusion:
- Both methods have their advantages and are suitable for different use cases.
- A combination of methods might be beneficial but could complicate the training process.
- Further research into detailed comparisons or case studies is recommended to determine the best approach based on specific enterprise goals and contexts.

Incorporating these approaches effectively can enhance the LLM’s domain expertise, improving its utility within the enterprise framework.

JacksonFan1225 · April 3, 2025, 2:17pm

Thanks a lot for the clarification. That clears things up.

system · April 4, 2025, 2:18am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Train from scratch vs further pretraining/fine tuning with MLM and NSP Research	1	1547	August 28, 2023
Instruction tuning llm Beginners	8	12394	May 8, 2024
Domain adaptation fine tune VS instruction_tuned 🤗Transformers	2	3132	January 21, 2024
Separate LM fine tuning and classification head training Beginners	5	1862	July 1, 2021
Resume Training / Finetune a language model and further finetune a classifier Research	1	1267	October 19, 2020

Difference between pre-training and fine tuning with language modeling to instill new knowledge

Related topics