Hey, I’m new to topics related to LLMs. I’m not sure if this is the right category, but I’ve come across many datasets containing texts from sources like Wikipedia, the web, books, and documentation. However, I’m unsure how to train a chatbot model using text that doesn’t include question-and-answer pairs.
Additionally, I’m curious about how companies like OpenAI and Meta train their models using text from the internet.
One more question (I know I’m asking a lot): how can I speed up training a text classifier model on a dataset with 441k rows?
Apologies for any mistakes; English isn’t my native language (I speak French).