How to train a model on multiple datasets

I was able to make a translation model as shown in the manual, but the quality is not very good, I wanted to feed it several datasets, but I realized that they do not add to each other, but overwrite the model, so how can I feed it several datasets?

And in general, did I decide to solve this problem correctly?

Huggingface has an interleave datasets function you could check out to combine several datasets together.

And in general, did I decide to solve this problem correctly?

Using more data very well might help, but hard to say without more context. Lots of things can make a model good or bad.

1 Like