The backend of the HF library is usually torch, so I donât think there will be any problems with the dataset loading script in torchâs Dataset format, and if youâre familiar with it, I think itâs better to do it that way, but I think it is currently recommended to build the dataset using the DatasetBuilder class in the Hugging Face datasets library as much as possible.
There also seems to be a class that sets split. I couldnât find a good English know-how page, so Iâll introduce a Japanese page, but I think you can understand the flow even from the translation. I think you can understand it roughly just from the code and class names. I donât know if it will fit your dataset.
If it doesnât fit the existing framework well, it may be faster and cause fewer problems to upload it as it is in two or three parts rather than trying to force it to fit.
1 Like