Problem accessing dataset

Hi,
I have just tried to make my first dataset on the Huggingface website.
Once I upload a few datafiles (3 different xlsx and 1 csv) I see the following message:

but when I click on “files” tab, I see all my files.
When I try to access the dataset using python Jupiter Notebook, I receive the following error message:


FileNotFoundError: Couldn’t find a dataset script at /content/Lord-Goku/testing_1/testing_1.py or any data file in the same directory. Couldn’t find ‘Lord-Goku/testing_1’ on the Hugging Face Hub either: FileNotFoundError: Unable to resolve any data file that matches [‘train[-._ 0-9/]', '[-._ 0-9/]train[-._ 0-9/]', 'training[-._ 0-9/]’, ‘[-._ 0-9/]training[-._ 0-9/]’] in dataset repository Lord-Goku/testing_1 with any supported extension

Even though the dataset is public, I have logged into my huggingface through the Notebook to see if that makes any difference but no luck there.

here is the code I have tried:
dataset = load_dataset(“Lord-Goku/testing_1”)

and:
data_files={“test”:“test.xlsx”}
dataset = load_dataset(“Lord-Goku/testing_1”, data_files = data_files)

neither of them works.

1 Like

Hi!

For the csv files, you can do:

from datasets import load_dataset

ds = load_dataset("Lord-Goku/testing_1", data_files="nyse-listed.csv")

I don’t think we have a BuilderConfig for xlsx files, so you can do this instead:

from datasets import Dataset
import pandas as pd

df = pd.read_excel("https://huggingface.co/datasets/Lord-Goku/testing_1/resolve/main/test.xlsx")
df = pd.DataFrame(df)
dataset = Dataset.from_pandas(df)
1 Like

Thank you so much for your prompt reply.
You are totally right, I re-read the documentation and I realized that Huggingface does not have .xlsx file support.
I was able to retrieve my csv file using your code correctly.
Appreciate your support. :innocent:

The only issue is that I still have the following message:
" The dataset is currently empty. [Upload or create new data files]. Then, you will be able to explore them in the Dataset Viewer."
Any idea why this is?

I think this might be because your repository isn’t structured properly (you’ve got a mix of xlsx and csv files). Can you try organizing your csv files as shown here?

Also, you can create a dataset with xlsx files, but you’ll need to write a loading script which is a bit more involved than just uploading your dataset files :slight_smile:

2 Likes

Thanks,
You were right. It seems to be in order now.
Appreciate your feedback. :nerd_face:

1 Like