`train_test_split` with IterableDataset

lhoestq · January 23, 2023, 1:48pm

Yes it’s not implemented right now but it should be possible to implement a train_test_split over the dataset shards. Contributions are welcome though if you’re interested in helping on this matter

For now I’d suggest you to define two separate datasets, one with the train data files and one with the test data files

Topic		Replies	Views
How to split main dataset into train, dev, test as DatasetDict 🤗Datasets	21	43061	May 23, 2024
How to create a train test split for an iterable dataset 🤗Datasets	1	1305	June 6, 2023
Not declaring splits inside of dataset loading script 🤗Datasets	2	1629	July 28, 2022
Splitting dataset from generator 🤗Datasets	3	1926	January 26, 2023
Splitting Dataset in the dataset loading script 🤗Datasets	1	606	September 16, 2022

`train_test_split` with IterableDataset

Related topics