Load_dataset split='test' not working

gpric024 · February 7, 2024, 3:10pm

This line of code here has been working for months for me… However. today I ran the code and its generating the train split. The ‘split’ argument is being ignored.

dataset_wmt_enfr = load_dataset("wmt14",'fr-en', split='test')

Has anyone else run into this issue? I saw there was a change to the repository a couple of days ago…

Thanks

albertvillanova · February 8, 2024, 9:16am

Hi @gpric024,

I cannot reproduce the issue.

I did the latest changes to the dataset, but these were just:

Removing the legacy JSON file: Delete legacy JSON metadata (#4) · wmt14 at 473719a
Updating one of the source data URL (OPUS changed their server and previous URL was giving 404 error): Fix OPUS download URL (#5) · wmt14 at 0aab7df
Updating the documentation in the README.md file:
- Update warning disclosure with organizers' response (#6) · wmt14 at 2fbd0b7
- Minor fix in dataset card (#7) · wmt14 at 76f6eba

I guess until now you were using the dataset that you had in your local cache, and now that the dataset has been updated on the Hub, the library tries to regenerate it and caching it again without success. So I think this could happen if you have a very old version of datasets. Could you please verify?

import datasets
datasets.__version__

If this is the case, I recommend you to update it:

pip install -U datasets

If after all, the problem persists on your side, I would ask you to include the complete stack trace error and information about your environment (by running the shell command datasets-cli env).

system · April 18, 2024, 6:20pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Load_dataset split=‘test’ not working again Beginners	3	25	April 19, 2025
Load_dataset assumes 'train' Beginners	2	927	May 31, 2023
`train_test_split` with IterableDataset 🤗Datasets	2	1786	January 26, 2023
Loading Dataset 🤗Datasets	1	217	February 15, 2024
Datasets.load_dataset not returning 'eval' or 'test' 🤗Datasets	2	680	May 17, 2022

Load_dataset split='test' not working

Related topics