Using same instructions for fine-tuning: Is this bad for the model?

Hi, currently I’m fine tuning mitral7B instruct fine-tuned model with my custom dataset. The instruction set is constructed in 3 formats: Provide appropriate title for following abstract of article: [‘abstract’], Give me a list of words that can appropriately fill in [MASK] in the following abstract: [‘abstract’], Summarize following article: [article].

There is about 1000 examples for each instructions. I tried fine tuning on only one instruction format, also mixed domain dataset which consists all 3 formats. Result is not pretty good. Maybe the reason can root on other hyperparameters I’m using, but I have questions beside that.

In [Zhou, Chunting, et al. “Lima: Less is more for alignment.” arXiv preprint arXiv:2305.11206 (2023).], authors mention that 1k dataset is sufficient if the prompt is diverse enough and well curated. However the model they tuned in the article is not for specific task but for generalized purpose. If I want to tune the model for specific task or want to make the model have expert knowledge on particular field, and there is no given dataset, so I had to made whole dataset that constructed in instruction-answer format, can I get good result when the instruction diversity is not satisfied? Is there anyone who got great result using not small(at least bigger than 2k) dataset which is only constructed in one instruction format?