Dataset preview not showing for uploaded DatasetDict

I created a DatasetDict and pushed it here.

I’m getting the message

Server Error
Status code:   400
Exception:     Status400Error
Message:       could not get the config name for this dataset

Am I supposed to create a config file somewhere that I missed so the dataset viewer works?


Hey @dansbecker as far as I know, the dataset viewer currently supports datasets with a loading script (example) or raw data files in common formats like JSON and CSV (example).

I agree that being able view the contents of DatasetDict objects would be a nice feature, so I’m tagging @lhoestq and @severo in case they have any additional insights here :slight_smile:

I created an issue at Dataset viewer issue for `dansbecker/hackernews_hiring_posts` · Issue #3392 · huggingface/datasets · GitHub


Hi ! Please use my_dataset.push_to_hub() to save your dataset on the Hub. Then to reload you can use load_dataset(). You can see the documentation here

Datasets saved with save_to_disk and uploaded manually to the Hub are not supported (yet). This is because saving locally uses the Arrow format: while this format allows a dataset to be immediately reloaded, it’s not the preferred format to store in the cloud since it’s uncompressed (it requires more bandwidth)

Do you all have a preference for continuing the conversation here vs in the issue?

I’d like to make sure I understand the advice above:

I tried the steps @lhoestq suggested

repo_url = ''
repo = Repository(local_dir=".", clone_from=repo_url)

That gives an error AttributeError: 'Repository' object has no attribute 'split'

I assume the split attribute is specified in the loading script that @lewtun mentioned? I see a _split_generators method in that example loading script. Does that create the split attribute in some way I’m missing?


the repo argument should be of type str, so try this instead: all_datasets.push_to_hub("hackernews_hiring_posts")

Thanks @mariosasko. That works great.