I’m trying to push my dataset to the hub using the dataset.push_to_hub but get the following error:
Pushing split train to the Hub.
Domain: work
Pushing dataset shards to the dataset hub: 0%| | 0/1 [00:00<?, ?it/s]
--------------------------------------------------------------------------- TypeError Traceback (most recent call last)
Cell In [37], line 8 6 dataset = dataset.train_test_split(test_size=0.2) 7 domain_datasets[domain] = dataset.remove_columns([“domain”,“index_level_0”])
----> 8 dataset.push_to_hub(f"fathyshalab/{domain}“) 10 domain_datasets[“work”] File ~/.conda/envs/baselines-transformers/lib/python3.9/site-packages/datasets/dataset_dict.py:1350, in DatasetDict.push_to_hub(self, repo_id, private, token, branch, max_shard_size, shard_size, embed_external_files) 1348 logger.warning(f"Pushing split {split} to the Hub.”) 1349 # The split=key needs to be removed before merging → 1350 repo_id, split, uploaded_size, dataset_nbytes, _, _ = self[split]._push_parquet_shards_to_hub( 1351 repo_id, 1352 split=split, 1353 private=private, 1354 token=token, 1355 branch=branch, 1356 max_shard_size=max_shard_size, 1357 embed_external_files=embed_external_files, 1358 ) 1359 total_uploaded_size += uploaded_size 1360 total_dataset_nbytes += dataset_nbytes File ~/.conda/envs/baselines-transformers/lib/python3.9/site-packages/datasets/arrow_dataset.py:4195, in Dataset._push_parquet_shards_to_hub(self, repo_id, split, private, token, branch, max_shard_size, embed_external_files) 4193 shard.to_parquet(buffer)
…
121 fn_name=fn.name, has_token=has_token, kwargs=kwargs 122 )
→ 124 return fn(*args, **kwargs) TypeError:
upload_file() got an unexpected keyword argument ‘identical_ok’ `
What I understand that in the dataset_arrow.py there is a identical_ok argument that isnt used anymore or added but not included in the upload files function right?
`