Load_dataset split='test' not working

This line of code here has been working for months for me… However. today I ran the code and its generating the train split. The ‘split’ argument is being ignored.

dataset_wmt_enfr = load_dataset("wmt14",'fr-en', split='test')

Has anyone else run into this issue? I saw there was a change to the repository a couple of days ago…

Thanks

Hi @gpric024,

I cannot reproduce the issue.

I did the latest changes to the dataset, but these were just:

I guess until now you were using the dataset that you had in your local cache, and now that the dataset has been updated on the Hub, the library tries to regenerate it and caching it again without success. So I think this could happen if you have a very old version of datasets. Could you please verify?

import datasets
datasets.__version__

If this is the case, I recommend you to update it:

pip install -U datasets

If after all, the problem persists on your side, I would ask you to include the complete stack trace error and information about your environment (by running the shell command datasets-cli env).

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.