Multiple Custom PyTorch Datasets

John6666 · January 26, 2025, 6:27am

The backend of the HF library is usually torch, so I don’t think there will be any problems with the dataset loading script in torch’s Dataset format, and if you’re familiar with it, I think it’s better to do it that way, but I think it is currently recommended to build the dataset using the DatasetBuilder class in the Hugging Face datasets library as much as possible.
There also seems to be a class that sets split. I couldn’t find a good English know-how page, so I’ll introduce a Japanese page, but I think you can understand the flow even from the translation. I think you can understand it roughly just from the code and class names. I don’t know if it will fit your dataset.
If it doesn’t fit the existing framework well, it may be faster and cause fewer problems to upload it as it is in two or three parts rather than trying to force it to fit.

Topic		Replies	Views
Using PyTorch Dataset Class with Dataset Builder 🤗Datasets	3	82	January 29, 2025
HF Datasets best practices 🤗Datasets	0	325	October 14, 2023
Upload efficiently for lazy split download Beginners	6	26	August 7, 2025
Load_dataset vs custom torch dataset Beginners	0	287	September 22, 2022
Dataset creation template 🤗Datasets	3	326	August 29, 2023

Multiple Custom PyTorch Datasets

Related topics