How to fine-tune to 3 very different sized datasets (very large to very small)

Dear HF,
I have an interesting problem and I would love some advice on it:

  • I have 3 datasets - 1 very large (few GB), 1 medium (few hundred MB), 1 small (few MB)
  • I want to finetune a decoder only LLM on them

I want the model to generalise well from the large one (pick up general concepts), fit more to the medium (pick up some structure), then fit very closely to the small dataset (pick up the text structure well).

What is the best way to go about this?

  • Vary the learning rate?
  • Vary epochs?

If so any good starting points? Any information or advice would be much appreciated as I’m struggling to know where to start here!

All the best!

The small dataset is only around 100 examples. Training the LLM on this for 3 epochs gives good results, I worry that more would result in too much overfitting. The large dataset could be huge in comparison!