We recently created a dataset in h5 format. We have one main h5 file with all data index by “id”. For train validation and test splits we have separate txt file with which ids belong to train, val or test set. How can we use _split_generators and _generate_examples in this setting without creating separate h5 files for train, val and test but just reading indexes from corresponding txt files ?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Create HF dataset from h5 | 3 | 2226 | October 20, 2021 | |
How to save huge encoded data into .h5 files to store and use later? | 0 | 257 | July 7, 2023 | |
Splitting Dataset in the dataset loading script | 1 | 590 | September 16, 2022 | |
Saving train/val/test datasets | 2 | 3452 | August 25, 2021 | |
Splitting dataset from generator | 3 | 1814 | January 26, 2023 |