Dataset with no splits

Hello everyone,

I am currently creating a dataset where the semantics of a split make no sense. It’s an Information Retrieval corpus that should not be split. I currently load the entire corpus into the “train” split because I’ve been copying what other dataset loading scripts do, and it works, but would rather drop splits altogether for the corpus config.

Can I use other strings to name the splits? Can I leave out splits altogether for a given self.config.name?

Hi ! By default you can use the train split, but feel free to name it whatever you want.

In practice, when you return the split generators in _split_generators(), you can specify the split name you want to datasets.SplitGenerator(name=...)

1 Like

Hi,

Thank you for this information, is it possible not to use any split at all ?

I mean, using " load_dataset (‘name’) " without any split ? I had an error while trying not to use any split argument.

yes we will in the mid term relax this requirement to make the use of splits optional (by merging DatasetDict and Dataset classes basically).

At the moment you can just assume all is in the “train” split for your use-case

1 Like