Huggingface dataset install

Dear All ,
This is my error. I tried with a different dataset, but it has the same error like this.
i used this code

from datasets import load_dataset
coco_dataset = load_dataset(“jxu124/refcoco-benchmark”)

Error : KeyError: ‘tags’

3 Likes

This issue?

Same issue, on both “microsoft/orca-agentinstruct-1M-v1” and “trl-lib/Capybara”

1 Like

I can load this…

DatasetDict({
    refcoco_unc_val: Dataset({
        features: ['ref_list', 'image_info', 'image'],
        num_rows: 1500
    })
    refcoco_unc_testA: Dataset({
        features: ['ref_list', 'image_info', 'image'],
        num_rows: 750
    })
    refcoco_unc_testB:
...

Edit:
Try

pip install -U datasets huggingface_hub
2 Likes

I fix it by upgrading the huggingface_hub to 0.27.0.

6 Likes

I’m still running into the same error (KeyError: 'tags'), even after trying different datasets and package versions:

  • :thinking: huggingface-hub: 0.27.0, datasets: 3.2.0, transformers: 4.40.2
  • :thinking: huggingface-hub: 0.19.2, datasets: 2.15.0, transformers: 4.30.2

Everything worked fine before, though.

Here’s the code snippet for reference:

# Load the module
from datasets import load_dataset_builder

# Create the dataset builder
reviews_builder = load_dataset_builder("derenrich/wikidata-en-descriptions-small")

# Print the features
print(reviews_builder.info.features)

Let me know if you spot something I might be missing!

2 Likes

It worked… something wrong…

Package             Version
------------------- -----------
aiofiles            23.2.1
aiohappyeyeballs    2.4.4
aiohttp             3.11.11
aiosignal           1.3.2
annotated-types     0.7.0
anyio               4.7.0
async-timeout       5.0.1
attrs               24.3.0
Authlib             1.4.0
certifi             2024.12.14
cffi                1.17.1
charset-normalizer  3.4.1
click               8.0.4
contourpy           1.3.1
cryptography        44.0.0
cycler              0.12.1
datasets            3.2.0
dill                0.3.8
exceptiongroup      1.2.2
fastapi             0.115.6
ffmpy               0.5.0
filelock            3.16.1
fonttools           4.55.3
frozenlist          1.5.0
fsspec              2024.9.0
gradio              4.44.0
gradio_client       1.3.0
h11                 0.14.0
hf_transfer         0.1.8
httpcore            1.0.7
httpx               0.28.1
huggingface-hub     0.27.0
idna                3.10
importlib_resources 6.4.5
itsdangerous        2.2.0
Jinja2              3.1.5
kiwisolver          1.4.8
markdown-it-py      3.0.0
MarkupSafe          2.1.5
matplotlib          3.10.0
mdurl               0.1.2
multidict           6.1.0
multiprocess        0.70.16
numpy               2.2.1
orjson              3.10.13
packaging           24.2
pandas              2.2.3
pillow              10.4.0
pip                 24.3.1
propcache           0.2.1
protobuf            3.20.3
psutil              5.9.8
pyarrow             18.1.0
pycparser           2.22
pydantic            2.10.4
pydantic_core       2.27.2
pydub               0.25.1
Pygments            2.18.0
pyparsing           3.2.0
python-dateutil     2.9.0.post0
python-multipart    0.0.20
pytz                2024.2
PyYAML              6.0.2
requests            2.32.3
rich                13.9.4
ruff                0.8.4
semantic-version    2.10.0
setuptools          65.5.1
shellingham         1.5.4
six                 1.17.0
sniffio             1.3.1
spaces              0.31.1
starlette           0.41.3
tomlkit             0.12.0
tqdm                4.67.1
typer               0.15.1
typing_extensions   4.12.2
tzdata              2024.2
urllib3             2.3.0
uvicorn             0.34.0
websockets          12.0
wheel               0.45.1
xxhash              3.5.0
yarl                1.18.3
{'output': Value(dtype='string', id=None), 'qid': Value(dtype='string', id=None), 'name': Value(dtype='string', id=None), 'input': Value(dtype='string', id=None), 'instruction': Value(dtype='string', id=None), 'text': Value(dtype='string', id=None)}

First, ensure that the dataset you’re working with (“jxu124/refcoco-benchmark”) actually contains a field named ‘tags’. You can inspect the available columns or features of the dataset.
Once you know the correct feature names, try accessing the dataset accordinly.
If ‘tags’ is not a valid key in the dataset, you’ll need to adjust your code to access the correct fields.

1 Like

@Qingru and @John6666 are beautiful people. Upgrading huggingface_hub worked for me.

2 Likes

I recently encountered the same problem, which I hadn’t experienced in the past six months. Thank you for your help :slight_smile: ; the issue was resolved after updating these two libraries, but I’m curious why this error suddenly occurred. Initially, I thought it might be a malfunction on my computer, and I was quite troubled by it.

2 Likes

but I’m curious why this error suddenly occurred.

The huggingface_hub library may be updated without notice when other HF libraries are updated. Also, if the behavior of the HF server side changes, the behavior of this library will also change as a result.
Therefore, if you have any problems with the files or network processing of the HF-related libraries, upgrading or downgrading huggingface_hub will help to solve the problem.

3 Likes

Thank you, you’re really a warm-hearted person, have a nice day! :hugs:

1 Like

Thank you, it helped me resolve my issue. However, I encountered another problem when someone pointed out that my Docker image was failing. I needed to downgrade the Python version to fix it.

1 Like

This solved the issue for me. Many thanks :smiley:

1 Like