Hey @dansbecker as far as I know, the dataset viewer currently supports datasets with a loading script (example) or raw data files in common formats like JSON and CSV (example).
I agree that being able view the contents of DatasetDict objects would be a nice feature, so I’m tagging @lhoestq and @severo in case they have any additional insights here
Hi ! Please use my_dataset.push_to_hub() to save your dataset on the Hub. Then to reload you can use load_dataset(). You can see the documentation here
Datasets saved with save_to_disk and uploaded manually to the Hub are not supported (yet). This is because saving locally uses the Arrow format: while this format allows a dataset to be immediately reloaded, it’s not the preferred format to store in the cloud since it’s uncompressed (it requires more bandwidth)
That gives an error AttributeError: 'Repository' object has no attribute 'split'
I assume the split attribute is specified in the loading script that @lewtun mentioned? I see a _split_generators method in that example loading script. Does that create the split attribute in some way I’m missing?