Greetings,
Iād like to do simple experiments on a programming language dataset, so Iām looking for something like code-parrot but way smaller.
Iām new to datasets library, so wanted to ask - is there a way to automatically get a small fraction of existing dataset, e.g. by adding some flag or āsmallā to the dataset name?
Also a dataset I found on the HFHub appears broken - should this be reported as an issue?:
from datasets import load_dataset
dataset = load_dataset("formermagic/github_python_1m") # Error:
# FileNotFoundError: Couldn't find a dataset script at /Users/USERNAME/PycharmProjects/parrot/formermagic/github_python_1m/github_python_1m.py or any data file in the same directory. Couldn't find 'formermagic/github_python_1m' on the Hugging Face Hub either: FileNotFoundError: The dataset repository at 'formermagic/github_python_1m' doesn't contain any data file.
Cheers,
Ilya