Huggingface datasets streaming problem

I trained my model with transformers.Trainer and datasets.load_dataset.
When I use the private datasets with streaming options, I got an error message “unauthorized” even if I pushed private tokens. It happens only in the training with transformers.Trainer. I guess there are some problems in certification errors for the dataloader in the Trainer. I want to know the reason why it happens yesterday (before the day I couldn’t see any message like this).

Hi! What kind of dataset are you trying to stream? We’ve recently fixed some issues with streaming private audio and vision datasets, so if your dataset falls into that category, you should update the lib to the newest version. Also, make sure to explicitly set use_auth_token=True or use_auth_token="<token>" in load_dataset when loading a private dataset.

Hi, Thanks for your concerns. I tried on my personal text data saved with .parquet extension(actually, it uploaded by the function datasets.DatasetDict.push_to_hub ). My implementation code is below,

from datasets import load_dataset

dataset = load_dataset(<repo_id>, use_auth_token=<private_token>, streaming=True)
for d in dataset['train']:
    print(d)
    break # this is for checking

it cause the error like this,

/usr/local/lib/python3.7/dist-packages/aiohttp/client_reqrep.py in raise_for_status(self)
1007 status=self.status,
1008 message=self.reason,
→ 1009 headers=self.headers,
1010 )
1011

ClientResponseError: 401, message='Unauthorized', url=URL('https://huggingface.co/datasets/.../train-00000-of-00001-168b451062c67c34.parquet')

At the first time, I guess it is because of the inserted token. But when I load again the dataset with the streaming=False option, it works well.

So, I wondered why this occurs…! (datasets version is “2.3.2”)

I fix this error and pulled it to the fix-auth-error-private-dataset by hkjeon13 · Pull Request #4699 · huggingface/datasets · GitHub

we’ll do a new release of datasets on monday to include the fix :slight_smile:

Thank you! I’m looking for :slight_smile:

We just did the release, please update datasets and let us know if you still have the issue :slight_smile:

pip install -U datasets
1 Like