Getting IndexError: list index out of range when fine-tuning

Hi everyone! I want to fine-tune my pre-trained Longformer model and am getting this error:-

---------------------------------------------------------------------------

IndexError                                Traceback (most recent call last)

<ipython-input-54-2f2d9c2c00fc> in <module>()
     45     )
     46 
---> 47 train_results = trainer.train()

6 frames

/usr/local/lib/python3.7/dist-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, **kwargs)
   1032             self.control = self.callback_handler.on_epoch_begin(self.args, self.state, self.control)
   1033 
-> 1034             for step, inputs in enumerate(epoch_iterator):
   1035 
   1036                 # Skip past any already trained steps if resuming training

/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in __next__(self)
    515             if self._sampler_iter is None:
    516                 self._reset()
--> 517             data = self._next_data()
    518             self._num_yielded += 1
    519             if self._dataset_kind == _DatasetKind.Iterable and \

/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in _next_data(self)
    555     def _next_data(self):
    556         index = self._next_index()  # may raise StopIteration
--> 557         data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    558         if self._pin_memory:
    559             data = _utils.pin_memory.pin_memory(data)

/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
     42     def fetch(self, possibly_batched_index):
     43         if self.auto_collation:
---> 44             data = [self.dataset[idx] for idx in possibly_batched_index]
     45         else:
     46             data = self.dataset[possibly_batched_index]

/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py in <listcomp>(.0)
     42     def fetch(self, possibly_batched_index):
     43         if self.auto_collation:
---> 44             data = [self.dataset[idx] for idx in possibly_batched_index]
     45         else:
     46             data = self.dataset[possibly_batched_index]

<ipython-input-53-5e4959dcf50c> in __getitem__(self, idx)
      7 
      8     def __getitem__(self, idx):
----> 9         item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
     10         item['labels'] = torch.tensor(self.labels[idx])
     11         return item

<ipython-input-53-5e4959dcf50c> in <dictcomp>(.0)
      7 
      8     def __getitem__(self, idx):
----> 9         item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
     10         item['labels'] = torch.tensor(self.labels[idx])
     11         return item

IndexError: list index out of range

Evidently, it’s a problem with my tokenization. but I can’t it - for training the LM, I ensured length argument is set for tokenizer:

tokenizer = LongformerTokenizerFast.from_pretrained("./ny_model", max_len=3500)

with a hefty 52000 vocab size. next, when fine-tuning:

train_encodings = tokenizer(list(train_text), truncation=True, padding=True, max_length=3500)
val_encodings = tokenizer(list(val_text), truncation=True, padding=True, max_length=3500)

you can see I truncate the sequences. I tried with some dummy data (ensuring they are of equal length), same problem.

So what could the problem be? Any ideas?

Note that I am fine-tuning the model after uploading the LM on Huggingface.
Also, I have attached the code required to train the LM:------
https://colab.research.google.com/drive/153754DbFXRhKdHvjdSUUp9VSB5JqtZwX?usp=sharing