Hi, I’m tuning Mistral 7B (qlora) for a very specialized task which has the following structure
- a paragraph-long prompt
- a 2 paragraph assistant response
- another 1 sentence prompt
- a structured JSON assistant response. The prompt portions will always be exactly the same.
The training is converging very quickly (500-600 steps with no batching, out of a total 5000 training rows), and I’m worried it’s because it’s over fitting on the fixed portions.
My question is, should I
- write a custom loss function to either ignore or downweight the prompt?
- do away with prompting altogether and hope the model will learn the task organically?
- something else?
Thanks!!