How to use dataset in h5 format?

emabjk · April 5, 2023, 8:22am

We recently created a dataset in h5 format. We have one main h5 file with all data index by “id”. For train validation and test splits we have separate txt file with which ids belong to train, val or test set. How can we use _split_generators and _generate_examples in this setting without creating separate h5 files for train, val and test but just reading indexes from corresponding txt files ?

Topic		Replies	Views
Create HF dataset from h5 🤗Datasets	3	2308	October 20, 2021
How to save huge encoded data into .h5 files to store and use later? 🤗Datasets	0	261	July 7, 2023
Splitting Dataset in the dataset loading script 🤗Datasets	1	599	September 16, 2022
Saving train/val/test datasets 🤗Datasets	2	3519	August 25, 2021
Splitting dataset from generator 🤗Datasets	3	1868	January 26, 2023

How to use dataset in h5 format?

Related topics