Hi! You can define two SplitGenerator
objects, one for train
and one for test
, and pass that file to each of them, and implement the splitting in _generate_examples
.
The code skeleton you can use:
def _split_generators(self, dl_manager):
...
return [
datasets.SplitGenerator(name="train", gen_kwargs={"data_file": data_file, "split":"train"}),
atasets.SplitGenerator(name="test", gen_kwargs={"data_file": data_file, "split":"test"})
]
def _generate_examples(self, data_file, split):
# split data based on the `split` value