IndexError: index out of bounds

Hi, I am trying to further pretrain “allenai/scibert_scivocab_uncased” on my own dataset using MLM. I am using following command -

python3 ./transformers/examples/language-modeling/ --model_name_or_path "allenai/scibert_scivocab_uncased" --train_file train.txt --validation_file validation.txt --do_train --do_eval --output_dir test1 --overwrite_cache --cache_dir ./tt
However I am getting error:

 0% 0/240 [00:00<?, ?ba/s]Traceback (most recent call last):
  File "./transformers/examples/language-modeling/", line 409, in <module>
  File "./transformers/examples/language-modeling/", line 355, in main
    load_from_cache_file=not data_args.overwrite_cache,
  File "/usr/local/lib/python3.6/dist-packages/datasets/", line 303, in map
    for k, dataset in self.items()
  File "/usr/local/lib/python3.6/dist-packages/datasets/", line 303, in <dictcomp>
    for k, dataset in self.items()
  File "/usr/local/lib/python3.6/dist-packages/datasets/", line 1259, in map
  File "/usr/local/lib/python3.6/dist-packages/datasets/", line 157, in wrapper
    out: Union["Dataset", "DatasetDict"] = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/datasets/", line 163, in wrapper
    out = func(self, *args, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/datasets/", line 1528, in _map_single
  File "/usr/local/lib/python3.6/dist-packages/datasets/", line 278, in write_batch
    pa_table = pa.Table.from_pydict(typed_sequence_examples)
  File "pyarrow/table.pxi", line 1474, in pyarrow.lib.Table.from_pydict
  File "pyarrow/array.pxi", line 322, in pyarrow.lib.asarray
  File "pyarrow/array.pxi", line 222, in pyarrow.lib.array
  File "pyarrow/array.pxi", line 110, in pyarrow.lib._handle_arrow_array_protocol
  File "/usr/local/lib/python3.6/dist-packages/datasets/", line 100, in __arrow_array__
    if trying_type and out[0].as_py() !=[0]:
  File "pyarrow/array.pxi", line 1058, in pyarrow.lib.Array.__getitem__
  File "pyarrow/array.pxi", line 540, in pyarrow.lib._normalize_index
IndexError: index out of bounds

Can someone help me in understanding this problem and how to resolve it? When I try the same command with bert-base-uncased, it runs fine. Also, what is the best practice to further pretrain a model on custom dataset?

Any progress on this? I’m currently facing the same issue.