Dataset preview not showing for uploaded DatasetDict

dansbecker · December 7, 2021, 4:13am

I created a DatasetDict and pushed it here.

I’m getting the message

Server Error
Status code:   400
Exception:     Status400Error
Message:       could not get the config name for this dataset

Am I supposed to create a config file somewhere that I missed so the dataset viewer works?

Thanks!

lewtun · December 7, 2021, 8:27am

Hey @dansbecker as far as I know, the dataset viewer currently supports datasets with a loading script (example) or raw data files in common formats like JSON and CSV (example).

I agree that being able view the contents of DatasetDict objects would be a nice feature, so I’m tagging @lhoestq and @severo in case they have any additional insights here

severo · December 7, 2021, 8:41am

I created an issue at Dataset viewer issue for `dansbecker/hackernews_hiring_posts` · Issue #3392 · huggingface/datasets · GitHub

lhoestq · December 7, 2021, 11:45am

Hi ! Please use my_dataset.push_to_hub() to save your dataset on the Hub. Then to reload you can use load_dataset(). You can see the documentation here

Datasets saved with save_to_disk and uploaded manually to the Hub are not supported (yet). This is because saving locally uses the Arrow format: while this format allows a dataset to be immediately reloaded, it’s not the preferred format to store in the cloud since it’s uncompressed (it requires more bandwidth)

dansbecker · December 7, 2021, 1:10pm

Do you all have a preference for continuing the conversation here vs in the issue?

I’d like to make sure I understand the advice above:

I tried the steps @lhoestq suggested

repo_url = 'https://huggingface.co/datasets/dansbecker/hackernews_hiring_posts'
repo = Repository(local_dir=".", clone_from=repo_url)
all_datasets.push_to_hub(repo)

That gives an error AttributeError: 'Repository' object has no attribute 'split'

I assume the split attribute is specified in the loading script that @lewtun mentioned? I see a _split_generators method in that example loading script. Does that create the split attribute in some way I’m missing?

mariosasko · December 7, 2021, 1:27pm

Hi,

the repo argument should be of type str, so try this instead: all_datasets.push_to_hub("hackernews_hiring_posts")

dansbecker · December 7, 2021, 1:47pm

Thanks @mariosasko. That works great.

Topic		Replies	Views
HuggingFace DataSet Preview Problem 🤗Datasets	4	474	May 27, 2024
The Dataset Preview has been disabled on this dataset 🤗Datasets	8	3581	November 2, 2023
Dataset shows 0 rows when loaded but full when pushed 🤗Datasets	0	421	July 26, 2023
Save `DatasetDict` to HuggingFace Hub 🤗Datasets	12	7442	October 20, 2023
Dataset preview doesn't working: "The split does not contain any rows." 🤗Datasets	3	1021	January 12, 2023

Dataset preview not showing for uploaded DatasetDict

Related topics