Hi folks.
I’ve been following along this tutorial notebook (https://github.com/NielsRogge/Transformers-Tutorials/blob/master/LayoutLM/Fine_tuning_LayoutLMForSequenceClassification_on_RVL_CDIP.ipynb) to figure out how to use LayoutLM model.
I was able to fine-tune my model on my desktop with open source dataset, but when I try to use the same script on my own dataset, I get the following error message
Traceback (most recent call last):
File “main_v1.py”, line 410, in
main(csv=True)
File “main_v1.py”, line 322, in main
encoded_train_dataset = train_dataset.map(lambda example: encode_example(example), features=features)
File “/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/datasets/arrow_dataset.py”, line 2364, in map
desc=desc,
File “/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/datasets/arrow_dataset.py”, line 532, in wrapper
out: Union[“Dataset”, “DatasetDict”] = func(self, *args, **kwargs)
File “/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/datasets/arrow_dataset.py”, line 499, in wrapper
out: Union[“Dataset”, “DatasetDict”] = func(self, *args, **kwargs)
File “/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/datasets/fingerprint.py”, line 458, in wrapper
out = func(self, *args, **kwargs)
File “/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/datasets/arrow_dataset.py”, line 2757, in _map_single
writer.finalize()
File “/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/datasets/arrow_writer.py”, line 537, in finalize
self.write_examples_on_file()
File “/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/datasets/arrow_writer.py”, line 414, in write_examples_on_file
self.write_batch(batch_examples=batch_examples)
File “/home/ec2-user/anaconda3/envs/JupyterSystemEnv/lib/python3.7/site-packages/datasets/arrow_writer.py”, line 504, in write_batch
col_type = features[col] if features else None
KeyError: ‘index_level_0’
So I took a look at the file arrow_writer.py (https://github.com/huggingface/datasets/blob/master/src/datasets/arrow_writer.py) and it appears that I’m getting that error because the code is thinking ‘index_level_0’ is a name of a column being passed and it’s not finding it within features. This is a bit confusing as I’m using the following for features
features = Features({
‘input_ids’: Sequence(feature=Value(dtype=‘int64’)),
‘bbox’: Array2D(dtype=“int64”, shape=(512, 4)),
‘attention_mask’: Sequence(Value(dtype=‘int64’)),
‘token_type_ids’: Sequence(Value(dtype=‘int64’)),
‘label’: ClassLabel(names=[‘refuted’, ‘entailed’]),
‘image_path’: Value(dtype=‘string’),
‘words’: Sequence(feature=Value(dtype=‘string’)),
})
Anyone have any clue as to what I’m doing wrong ? I’m not sure where I need to look into to debug this issue.