Hi,
unfortunately I still have this problem, although I use the latest datasets version 1.8.0.
I am trying to run the run_ner.py from transformers/examples/pytorch/token-classification at master · huggingface/transformers · GitHub on Google Colab using a custom dataset. For a small set it does work but when using my dataset in its entirety I always get the following error:
0% 0/1 [00:00<?, ?ba/s]Traceback (most recent call last):
File "/content/drive/MyDrive/ner/run_ner.py", line 512, in <module>
main()
File "/content/drive/MyDrive/ner/run_ner.py", line 359, in main
load_from_cache_file=not data_args.overwrite_cache,
File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 1635, in map
desc=desc,
File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 186, in wrapper
out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/datasets/fingerprint.py", line 397, in wrapper
out = func(self, *args, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 1954, in _map_single
batch = input_dataset[i : i + batch_size]
File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 1484, in __getitem__
format_kwargs=self._format_kwargs,
File "/usr/local/lib/python3.7/dist-packages/datasets/arrow_dataset.py", line 1471, in _getitem
pa_subtable = query_table(self._data, key, indices=self._indices if self._indices is not None else None)
File "/usr/local/lib/python3.7/dist-packages/datasets/formatting/formatting.py", line 368, in query_table
pa_subtable = _query_table(table, key)
File "/usr/local/lib/python3.7/dist-packages/datasets/formatting/formatting.py", line 84, in _query_table
return table.fast_slice(key.start, key.stop - key.start)
File "/usr/local/lib/python3.7/dist-packages/datasets/table.py", line 129, in fast_slice
i = _interpolation_search(self._offsets, offset)
File "/usr/local/lib/python3.7/dist-packages/datasets/table.py", line 92, in _interpolation_search
raise IndexError(f"Invalid query '{x}' for size {arr[-1] if len(arr) else 'none'}.")
IndexError: Invalid query '0' for size 1.
0% 0/1 [00:00<?, ?ba/s]
Do you have any idea what could cause it? Or is there a workaround for this?
Sorry, I am a newbie to Huggingface…
Thank you!