I trained my model with transformers.Trainer and datasets.load_dataset.
When I use the private datasets with streaming options, I got an error message “unauthorized” even if I pushed private tokens. It happens only in the training with transformers.Trainer. I guess there are some problems in certification errors for the dataloader in the Trainer. I want to know the reason why it happens yesterday (before the day I couldn’t see any message like this).
Hi! What kind of dataset are you trying to stream? We’ve recently fixed some issues with streaming private audio and vision datasets, so if your dataset falls into that category, you should update the lib to the newest version. Also, make sure to explicitly set use_auth_token=True
or use_auth_token="<token>"
in load_dataset
when loading a private dataset.
Hi, Thanks for your concerns. I tried on my personal text data saved with .parquet
extension(actually, it uploaded by the function datasets.DatasetDict.push_to_hub
). My implementation code is below,
from datasets import load_dataset
dataset = load_dataset(<repo_id>, use_auth_token=<private_token>, streaming=True)
for d in dataset['train']:
print(d)
break # this is for checking
it cause the error like this,
/usr/local/lib/python3.7/dist-packages/aiohttp/client_reqrep.py in raise_for_status(self)
1007 status=self.status,
1008 message=self.reason,
→ 1009 headers=self.headers,
1010 )
1011
ClientResponseError: 401, message='Unauthorized', url=URL('https://huggingface.co/datasets/.../train-00000-of-00001-168b451062c67c34.parquet')
At the first time, I guess it is because of the inserted token. But when I load again the dataset with the streaming=False
option, it works well.
So, I wondered why this occurs…! (datasets version is “2.3.2”)
I fix this error and pulled it to the fix-auth-error-private-dataset by hkjeon13 · Pull Request #4699 · huggingface/datasets · GitHub
we’ll do a new release of datasets
on monday to include the fix
Thank you! I’m looking for
We just did the release, please update datasets
and let us know if you still have the issue
pip install -U datasets