Model Tuning and Re-Tuning Problems

:rocket: Working on Fine-Tuning LLMs – Need Some Expert Advice! :robot:

I’ve been experimenting with model training using both general user data and tool-specific data from our platform.
Initially, I combined (shuffled) both types of data for fine-tuning. Later, I shifted to a phased learning approach:

:right_arrow: Phase 1: Fine-tuned the model with general platform data.
:right_arrow: Phase 2: Re-tuned it with only tool-related data using adapters from Phase 1.

Despite this structured approach, I’m still facing a few recurring issues:

:one: Why does the model randomly return the system prompt (e.g., the one provided in system role) in its replies?
:two: Why does it ask for tool details even during general greetings or unrelated platform queries?
:three: Why does it miss some required fields when constructing tool calls?
:four: Why does it invent new tool parameters not defined in the schema?
:five: Why doesn’t it ask for all required fields in plain text consistently?
:six: After a tool call, why does it repeat previous answers instead of asking for required fields again if the same query is asked?

:magnifying_glass_tilted_left: If anyone has tackled similar issues while fine-tuning LLMs (e.g., using LoRA adapters or phased training), I’d love to hear your thoughts or tips!

:speech_balloon: Feel free to comment or DM—any insights are truly appreciated. Thanks in advance!

1 Like

Did your fine tuning contain the system prompts and the model was trained on seeing that?
You could add in the system prompt telling the model to not repeat the system prompt ( at inference) or strip it out from the output. Probably would use the first method to save token count.

  1. I think you are encountering a default behavior learned from your data and the model can’t differentiate from user greetings and the user queries asking about the tools. I would add synthetic data of data containing normal NLP interactions between model and user to off set this to your phase 2. You could also add a label to the prompt to detect if the user is actually asking for a tool and to determine to output metadata.

  2. My assumption, but your data is missing a consistent schema format and the model has underlearned the schema. You probably have data that is needed in some inputs and trimming the schemas in other inputs. You could include required tool schemas and optional tool schemas. You can validate the output json to confer correctness.

  3. Again there is sample imbalance to know what it needs.

  4. You need to work on the cache/retrieval logic or reissue the call again as a new query..

Hope this helps :slight_smile:

2 Likes