Create custom splits

I was looking at the imdb dataset script, and I noticed that it uses a custom split - ā€œunsupervisedā€. However, in my custom dataset, a custom split throws a TypeError. How do I resolve this?

 Traceback (most recent call last):
  File "/Users/home/.local/share/virtualenvs/JR745-rx6Pdnnr/bin/datasets-cli", line 8, in <module>
    sys.exit(main())
  File "/Users/home/.local/share/virtualenvs/JR745-rx6Pdnnr/lib/python3.8/site-packages/datasets/commands/datasets_cli.py", line 39, in main
    service.run()
  File "/Users/home/.local/share/virtualenvs/JR745-rx6Pdnnr/lib/python3.8/site-packages/datasets/commands/test.py", line 141, in run
    builder.download_and_prepare(
  File "/Users/home/.local/share/virtualenvs/JR745-rx6Pdnnr/lib/python3.8/site-packages/datasets/builder.py", line 860, in download_and_prepare
    self._download_and_prepare(
  File "/Users/home/.local/share/virtualenvs/JR745-rx6Pdnnr/lib/python3.8/site-packages/datasets/builder.py", line 953, in _download_and_prepare
    self._prepare_split(split_generator, **prepare_split_kwargs)
  File "/Users/home/.local/share/virtualenvs/JR745-rx6Pdnnr/lib/python3.8/site-packages/datasets/builder.py", line 1669, in _prepare_split
    split_info = self.info.splits[split_generator.name]
  File "/Users/home/.local/share/virtualenvs/JR745-rx6Pdnnr/lib/python3.8/site-packages/datasets/splits.py", line 530, in __getitem__
    instructions = make_file_instructions(
  File "/Users/home/.local/share/virtualenvs/JR745-rx6Pdnnr/lib/python3.8/site-packages/datasets/arrow_reader.py", line 112, in make_file_instructions
    name2filenames = {
  File "/Users/home/.local/share/virtualenvs/JR745-rx6Pdnnr/lib/python3.8/site-packages/datasets/arrow_reader.py", line 113, in <dictcomp>
    info.name: filenames_for_dataset_split(
  File "/Users/home/.local/share/virtualenvs/JR745-rx6Pdnnr/lib/python3.8/site-packages/datasets/naming.py", line 71, in filenames_for_dataset_split
    prefix = filename_prefix_for_split(dataset_name, split)
  File "/Users/home/.local/share/virtualenvs/JR745-rx6Pdnnr/lib/python3.8/site-packages/datasets/naming.py", line 54, in filename_prefix_for_split
    if os.path.basename(name) != name:
  File "/Users/home/anaconda3/lib/python3.8/posixpath.py", line 142, in basename
    p = os.fspath(p)
TypeError: expected str, bytes or os.PathLike object, not NoneType

hi @sl02 ! this looks similar to this issue: Adding new splits to a dataset script with existing old splits info in metadata's `dataset_info` fails Ā· Issue #5315 Ā· huggingface/datasets Ā· GitHub

have you generated README.md file for your dataset?

@polinaeterna
yes I did generate a README.md in the first iteration.

After removing the README.md, and updating the script, it runs fine.
Thank you!

I solved this issue by setting a different datasets.DatasetInfo.description depending on the splits (or other arguments) I might have for that dataset.