KeyError: Field ".." does not exist in table schema

marcomatta · November 29, 2021, 4:42pm

Hi everyone! I’m trying to run the run_ner.py script to perform a NER task on a custom dataset. The dataset was originally composed by 3 tsv files that I converted in csv files in order to run that script. Unfortunately, I got this error:

Traceback (most recent call last):
File “C:\Users\User\Desktop\NLP\run_ner.py”, line 578, in
main()
File “C:\Users\User\Desktop\NLP\run_ner.py”, line 262, in main
raw_datasets = load_dataset(extension, data_files=data_files, cache_dir=model_args.cache_dir)
File “C:\Users\User\Desktop\NLP\myenv\lib\site-packages\datasets\load.py”, line 1664, in load_dataset
builder_instance.download_and_prepare(
File “C:\Users\User\Desktop\NLP\myenv\lib\site-packages\datasets\builder.py”, line 593, in download_and_prepare
self._download_and_prepare(
File “C:\Users\User\Desktop\NLP\myenv\lib\site-packages\datasets\builder.py”, line 681, in _download_and_prepare
self._prepare_split(split_generator, **prepare_split_kwargs)
File “C:\Users\User\Desktop\NLP\myenv\lib\site-packages\datasets\builder.py”, line 1136, in _prepare_split
writer.write_table(table)
File “C:\Users\User\Desktop\NLP\myenv\lib\site-packages\datasets\arrow_writer.py”, line 454, in write_table
pa_table = pa.Table.from_arrays([pa_table[name] for name in self._schema.names], schema=self._schema)
File “C:\Users\User\Desktop\NLP\myenv\lib\site-packages\datasets\arrow_writer.py”, line 454, in
pa_table = pa.Table.from_arrays([pa_table[name] for name in self._schema.names], schema=self._schema)
File “pyarrow\table.pxi”, line 1339, in pyarrow.lib.Table.getitem
File “pyarrow\table.pxi”, line 1900, in pyarrow.lib.Table.column
File “pyarrow\table.pxi”, line 1875, in pyarrow.lib.Table._ensure_integer_index
KeyError: ‘Field “Il” does not exist in table schema’

The head of the train csv is like:

I think the Field “Il” to which it refers in the KeyError is the first row of the train_labeled.csv

The command tha I’m running is:

python run_ner.py --model_name_or_path Musixmatch/umberto-commoncrawl-cased-v1 --tokenizer_name Musixmatch/umberto-commoncrawl-cased-v1 --train_file train_labeled.csv --validation_file devel_labeled.csv --test_file test_labeled.csv --output_dire umberto-ner --do_train --do_eval --do_predict

Can someone help me with this issue? Thanks!

marcomatta · December 2, 2021, 4:07pm

I solved the problem, the csv files generated with pandas needed a post processing in Excel: words and labels had to be in two separated columns.
They were like:

col A
word,label

They have to be:

col A col B
word label

Topic		Replies	Views
KeyError: 'Field "builder_name" does not exist in table schema' 🤗Datasets	5	1783	January 20, 2022
Passing schema features to a load_dataset function 🤗Datasets	4	1454	August 26, 2021
A strange thing happened when I used the `load_dataset` Beginners	0	302	May 14, 2022
IndexError: list index out of range for loading_dataset 🤗Datasets	1	1730	February 15, 2022
Loading Custom Datasets 🤗Datasets	7	10769	May 25, 2021

KeyError: Field ".." does not exist in table schema

Related topics