Issue with huggingface.load_dataset()

Hello, I recently encountered an issue with the function huggingface.load_dataset() that raised the following error:

Error: KeyError: ‘tags’

In an attempt to resolve this, I found a tutorial (link here), which suggested updating the huggingface_hub library in the requirements.txt file. I followed the tutorial and updated the huggingface_hub version accordingly. However, after this update, I am now facing a different error.

--> ERROR: docker.io/library/python:3.1: not found

Can anyone help with understanding why this update is causing an issue and how I can fix it?
Space: QASports Website - Basketball - a Hugging Face Space by leomaurodesenv

1 Like

Hi! It seems like you’ve encountered two distinct issues after updating the huggingface_hub library.

  1. KeyError: ‘tags’ – This error usually occurs when the dataset you’re trying to load has missing metadata or if you’re using an incompatible version of the dataset. It could also be caused by outdated versions of datasets or huggingface_hub. Updating both might help resolve this.

  2. ERROR: Docker Hub Container Image Library | App Containerization not found – This issue is due to the use of a non-existent Docker image. The python:3.1 image does not exist (the latest stable version of Python is 3.10 or 3.11). You should update your Dockerfile or requirements to use a valid image, such as python:3.8, python:3.9, or python:3.10.

Suggested Fix:

  • Roll back or update your Hugging Face libraries to ensure compatibility:
    pip install --upgrade huggingface_hub datasets
    
  • Correct the Docker image reference:
    • Update your Dockerfile to use a valid Python image, e.g.:
      FROM python:3.9
      

Hope this help!

3 Likes

Hello @Alanturner2 , thank you for your suggestions.

Do you know how can I change the Docker image? I am using Streamlit Spaces, thus it is not clear for me how to do that.

[Edit] Please, take a look into my last commit who generated the Docker image issue - fix: load dataset · leomaurodesenv/qasports-website at c5b95e6

1 Like

I just fixed. Step-by-step:

  • Downgrade the Python from 3.10 to 3.9
  • Fix Streamlit sdk version
  • Update the requirements.txt with latest datasets and huggingface_hub libraries.

README.md

sdk: streamlit
sdk_version: 1.33.0
python_version: 3.9
app_file: app.py
...

requirements.txt

# HuggingFace
datasets==3.2.0
huggingface_hub==0.27.0

app.py

from datasets import load_dataset

dataset_name = "PedroCJardim/QASports"
dataset_split = "basketball"
dataset = load_dataset(dataset_name, name=dataset_split)

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.