I’m running into an issue where some fairly-basic code, which definitely worked about 10 months ago, now produces an error message. The Docker image where the code runs has not changed, so it must be some invisible change from the Hugging Face side
Here is the entire code snippet:
from datasets import load_dataset
dataset = load_dataset("sms_spam")
Previously, this retrieved the sms_spam
dataset. Now, it produces a KeyError: 'tags'
.
My package versions are:
datasets==2.18.0
huggingface-hub==0.21.4
transformers==4.36.0
I see another thread flagging this same KeyError
, and the suggested solution is to upgrade datasets
and huggingface_hub
Huggingface dataset install - #4 by John6666
I will follow this advice as a temporary solution, but it’s not a permanent fix.
I am trying to write Hugging Face code that will keep working for years, not stop working after a couple of months!
Can anyone explain:
- Whether there is any way to prevent this from happening again? For example, can I “pin” the dataset itself to a particular version? Or some other argument, e.g. the URL that is called when the dataset is loaded? I suppose the alternative is downloading and saving all datasets locally, but that’s less convenient, and convenience is why I’m using
load_dataset
in the first place - Why this stopped working? For example, is there a particular discussion or commit message where I can read about how it was decided to make old versions of these packages stop working? Even if it’s not preventable, if I can understand where these conversations are happening, that will let me be less reactive, more proactive
Thanks!