Looking for "How-to" on training with multiple files

I’ve gone thru HuggingFace training on training a model with a string. But is there info/tutorial on how to do it with multiple files?

Do I need to get the docs into 1long string variable?
Do I need to split the files into single sentences instead of paragraphs?
Can I feed it one text file at a time and it continues to learn?

I’m looking for a starting point for my learning!

1 Like

That’s a good question, and one that I can’t answer.

It depends on what kind of model you want, and many people are researching the best way to do it…
I think it would be quicker to ask about the progress so far in the nlp-related channel on HF Discord, ask-for-help, or general.