Natural Language Processing with Transformers, 02_classification.ipynb

hkbluesky · June 19, 2023, 8:08pm

Hi, I am running this example.

nlp-with-transformers/notebooks/blob/main/02_classification.ipynb

{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Uncomment and run this cell if you're on Colab or Kaggle\n",
    "# !git clone https://github.com/nlp-with-transformers/notebooks.git\n",
    "# %cd notebooks\n",
    "# from install import *\n",
    "# install_requirements(is_chapter2=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],

This file has been truncated. show original

However, there is an error when I run this line.
emotions_local = load_dataset(“csv”, data_files=“train.txt”, sep=“;”, names=[“text”, “label”])

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_92/3318475519.py in <cell line: 2>()
      1 #hide_output
----> 2 emotions_local = load_dataset("csv", data_files="train.txt", sep=";", 
      3                               names=["text", "label"])

~/.conda/envs/default/lib/python3.9/site-packages/datasets/load.py in load_dataset(path, name, data_dir, data_files, split, cache_dir, features, download_config, download_mode, ignore_verifications, keep_in_memory, save_infos, revision, use_auth_token, task, streaming, script_version, **config_kwargs)
   1662             Keyword arguments to be passed to the `BuilderConfig`
   1663             and used in the [`DatasetBuilder`].
-> 1664 
   1665     Returns:
   1666         [`Dataset`] or [`DatasetDict`]:

~/.conda/envs/default/lib/python3.9/site-packages/datasets/builder.py in download_and_prepare(self, download_config, download_mode, ignore_verifications, try_from_hf_gcs, dl_manager, base_path, use_auth_token, **download_and_prepare_kwargs)
    591     def _info(self) -> DatasetInfo:
    592         """Construct the DatasetInfo object. See `DatasetInfo` for details.
--> 593 
    594         Warning: This function is only called once and the result is cached for all
    595         following .info() calls.

~/.conda/envs/default/lib/python3.9/site-packages/datasets/builder.py in _download_and_prepare(self, dl_manager, verify_infos, **prepare_split_kwargs)
    679                 Key/value pairs to be passed on to the caching file-system backend, if any.
    680 
--> 681                 <Added version="2.5.0"/>
    682             **download_and_prepare_kwargs (additional keyword arguments): Keyword arguments.
    683 

~/.conda/envs/default/lib/python3.9/site-packages/datasets/builder.py in _prepare_split(self, split_generator)
   1131                     for checksums_dict in split_checksums_dicts.values()
   1132                 )
-> 1133                 if self.info.dataset_size is not None and self.info.download_size is not None:
   1134                     self.info.size_in_bytes = (
   1135                         self.info.dataset_size + self.info.download_size + self.info.post_processing_size

~/.conda/envs/default/lib/python3.9/site-packages/tqdm/notebook.py in __iter__(self)
    252         try:
    253             it = super(tqdm_notebook, self).__iter__()
--> 254             for obj in it:
    255                 # return super(tqdm...) will not catch exception
    256                 yield obj

~/.conda/envs/default/lib/python3.9/site-packages/tqdm/std.py in __iter__(self)
   1164         # (note: keep this check outside the loop for performance)
   1165         if self.disable:
-> 1166             for obj in iterable:
   1167                 yield obj
   1168             return

~/.conda/envs/default/lib/python3.9/site-packages/datasets/packaged_modules/csv/csv.py in _generate_tables(self, files)
    168         dtype = (
    169             {
--> 170                 name: dtype.to_pandas_dtype() if not require_storage_cast(feature) else object
    171                 for name, dtype, feature in zip(schema.names, schema.types, self.config.features.values())
    172             }

TypeError: read_csv() got an unexpected keyword argument 'mangle_dupe_cols'

Does anyone know how to solve it?

Thanks

MandeeBot · October 21, 2023, 4:10am

if you are working locally, either downgrade your pandas’ version or upgrade your dataset library,
I can see you are working on a Kaggle notebook, so i will advise just downgrading your pandas’ version

Hope2000 · April 7, 2024, 8:12am

what version of pandas do I have to use on Kaggle?

Topic		Replies	Views
Question answering Beginners	0	290	November 1, 2021
Load_dataset('csv', data_files='./imdb.csv') [Errno 2] No such file or directory: './imdb.csv' 🤗Datasets	2	344	November 29, 2023
Why do I get this error running tokenizer? Beginners	6	17577	August 20, 2020
Fine-tune transformers for language model Beginners	2	662	August 14, 2022
RuntimeError - NPL with transformers book - 02_classification.ipynb Beginners	1	387	April 30, 2022

Natural Language Processing with Transformers, 02_classification.ipynb

Related topics