Using same instructions for fine-tuning: Is this bad for the model?

Donor · January 9, 2024, 7:20am

Hi, currently I’m fine tuning mitral7B instruct fine-tuned model with my custom dataset. The instruction set is constructed in 3 formats: Provide appropriate title for following abstract of article: [‘abstract’], Give me a list of words that can appropriately fill in [MASK] in the following abstract: [‘abstract’], Summarize following article: [article].

There is about 1000 examples for each instructions. I tried fine tuning on only one instruction format, also mixed domain dataset which consists all 3 formats. Result is not pretty good. Maybe the reason can root on other hyperparameters I’m using, but I have questions beside that.

In [Zhou, Chunting, et al. “Lima: Less is more for alignment.” arXiv preprint arXiv:2305.11206 (2023).], authors mention that 1k dataset is sufficient if the prompt is diverse enough and well curated. However the model they tuned in the article is not for specific task but for generalized purpose. If I want to tune the model for specific task or want to make the model have expert knowledge on particular field, and there is no given dataset, so I had to made whole dataset that constructed in instruction-answer format, can I get good result when the instruction diversity is not satisfied? Is there anyone who got great result using not small(at least bigger than 2k) dataset which is only constructed in one instruction format?

bedirt · March 26, 2024, 9:21pm

Did you get an answer for this?

Topic		Replies	Views
Problems with understanding instruction fine-tuning Beginners	0	450	April 2, 2024
Finetuning on base or instruct model? Beginners	0	1699	April 6, 2024
Domain adaptation fine tune VS instruction_tuned 🤗Transformers	2	3097	January 21, 2024
How to fine-tune to 3 very different sized datasets (very large to very small) Intermediate	0	786	February 24, 2023
A criticism of instruction fine-tuning datasets Research	2	2095	June 20, 2023

Using same instructions for fine-tuning: Is this bad for the model?

Related topics