Wikipedia (or something else) text to input output

Bigfoot302 · November 15, 2024, 6:47pm

Hey, I’m new to topics related to LLMs. I’m not sure if this is the right category, but I’ve come across many datasets containing texts from sources like Wikipedia, the web, books, and documentation. However, I’m unsure how to train a chatbot model using text that doesn’t include question-and-answer pairs.

Additionally, I’m curious about how companies like OpenAI and Meta train their models using text from the internet.

One more question (I know I’m asking a lot): how can I speed up training a text classifier model on a dataset with 441k rows?

Apologies for any mistakes; English isn’t my native language (I speak French).

Topic		Replies	Views
Repost: Wikipedia (or something else) text to input output Beginners	3	276	November 18, 2024
Training existing llm on my data Beginners	0	502	June 17, 2023
How to Train on Corpus of Text w/o splitting into Q&A JSON 🤗Datasets	0	116	March 30, 2024
Request for Further Information on Datasets Beginners	0	283	November 26, 2020
Train GPT2/3 on social media posts and comments (reddit/Facebook etc) Flax/JAX Projects	4	453	June 29, 2021

Wikipedia (or something else) text to input output

Related topics