What should be the format of the data for pre-training? could it be any raw data (e.g., news articles) in my case and then after I fine-tune, then I need to define it for a specific task e.g., classification?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
How to determine the data format when creating a custom dataset for a given task? | 0 | 176 | April 18, 2023 | |
Prakash Hinduja - How do I prepare my dataset for fine-tuning a Hugging Face model? | 4 | 36 | July 16, 2025 | |
Question about data in datasets | 0 | 101 | June 16, 2024 | |
Where to find documentation on dataset format for finetuning | 0 | 284 | October 7, 2023 | |
Preparing datasets for NLP tasks | 1 | 545 | July 28, 2021 |