I created a DatasetDict and pushed it here.
I’m getting the message
Status code: 400
Message: could not get the config name for this dataset
Am I supposed to create a config file somewhere that I missed so the dataset viewer works?
Hey @dansbecker as far as I know, the dataset viewer currently supports datasets with a loading script (example) or raw data files in common formats like JSON and CSV (example).
I agree that being able view the contents of
DatasetDict objects would be a nice feature, so I’m tagging @lhoestq and @severo in case they have any additional insights here
Hi ! Please use
my_dataset.push_to_hub() to save your dataset on the Hub. Then to reload you can use
load_dataset(). You can see the documentation here
Datasets saved with
save_to_disk and uploaded manually to the Hub are not supported (yet). This is because saving locally uses the Arrow format: while this format allows a dataset to be immediately reloaded, it’s not the preferred format to store in the cloud since it’s uncompressed (it requires more bandwidth)
Do you all have a preference for continuing the conversation here vs in the issue?
I’d like to make sure I understand the advice above:
I tried the steps @lhoestq suggested
repo_url = 'https://huggingface.co/datasets/dansbecker/hackernews_hiring_posts'
repo = Repository(local_dir=".", clone_from=repo_url)
That gives an error
AttributeError: 'Repository' object has no attribute 'split'
I assume the split attribute is specified in the loading script that @lewtun mentioned? I see a
_split_generators method in that example loading script. Does that create the
split attribute in some way I’m missing?
repo argument should be of type
str, so try this instead:
Thanks @mariosasko. That works great.