Natural Language Processing with Transformers, 02_classification.ipynb

Hi, I am running this example.

However, there is an error when I run this line.
emotions_local = load_dataset(“csv”, data_files=“train.txt”, sep=“;”, names=[“text”, “label”])

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_92/3318475519.py in <cell line: 2>()
      1 #hide_output
----> 2 emotions_local = load_dataset("csv", data_files="train.txt", sep=";", 
      3                               names=["text", "label"])

~/.conda/envs/default/lib/python3.9/site-packages/datasets/load.py in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, ignore_verifications, keep_in_memory, save_infos, revision, use_auth_token, task, streaming, script_version, **config_kwargs)
   1662             Keyword arguments to be passed to the `BuilderConfig`
   1663             and used in the [`DatasetBuilder`].
-> 1664 
   1665     Returns:
   1666         [`Dataset`] or [`DatasetDict`]:

~/.conda/envs/default/lib/python3.9/site-packages/datasets/builder.py in download_and_prepare(self, download_config, download_mode, ignore_verifications, try_from_hf_gcs, dl_manager, base_path, use_auth_token, **download_and_prepare_kwargs)
    591     def _info(self) -> DatasetInfo:
    592         """Construct the DatasetInfo object. See `DatasetInfo` for details.
--> 593 
    594         Warning: This function is only called once and the result is cached for all
    595         following .info() calls.

~/.conda/envs/default/lib/python3.9/site-packages/datasets/builder.py in _download_and_prepare(self, dl_manager, verify_infos, **prepare_split_kwargs)
    679                 Key/value pairs to be passed on to the caching file-system backend, if any.
    680 
--> 681                 <Added version="2.5.0"/>
    682             **download_and_prepare_kwargs (additional keyword arguments): Keyword arguments.
    683 

~/.conda/envs/default/lib/python3.9/site-packages/datasets/builder.py in _prepare_split(self, split_generator)
   1131                     for checksums_dict in split_checksums_dicts.values()
   1132                 )
-> 1133                 if self.info.dataset_size is not None and self.info.download_size is not None:
   1134                     self.info.size_in_bytes = (
   1135                         self.info.dataset_size + self.info.download_size + self.info.post_processing_size

~/.conda/envs/default/lib/python3.9/site-packages/tqdm/notebook.py in __iter__(self)
    252         try:
    253             it = super(tqdm_notebook, self).__iter__()
--> 254             for obj in it:
    255                 # return super(tqdm...) will not catch exception
    256                 yield obj

~/.conda/envs/default/lib/python3.9/site-packages/tqdm/std.py in __iter__(self)
   1164         # (note: keep this check outside the loop for performance)
   1165         if self.disable:
-> 1166             for obj in iterable:
   1167                 yield obj
   1168             return

~/.conda/envs/default/lib/python3.9/site-packages/datasets/packaged_modules/csv/csv.py in _generate_tables(self, files)
    168         dtype = (
    169             {
--> 170                 name: dtype.to_pandas_dtype() if not require_storage_cast(feature) else object
    171                 for name, dtype, feature in zip(schema.names, schema.types, self.config.features.values())
    172             }

TypeError: read_csv() got an unexpected keyword argument 'mangle_dupe_cols'

Does anyone know how to solve it?


Thanks

1 Like

if you are working locally, either downgrade your pandas’ version or upgrade your dataset library,
I can see you are working on a Kaggle notebook, so i will advise just downgrading your pandas’ version

what version of pandas do I have to use on Kaggle?