Getting IndexError: list index out of range when fine-tuning

Neel-Gupta · April 5, 2021, 6:02pm

Hi everyone! I want to fine-tune my pre-trained Longformer model and am getting this error:-

---------------------------------------------------------------------------

IndexError                                Traceback (most recent call last)

<ipython-input-54-2f2d9c2c00fc> in <module>()
     45     )
     46 
---> 47 train_results = trainer.train()

6 frames

/usr/local/lib/python3.7/dist-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, **kwargs)
   1032             self.control = self.callback_handler.on_epoch_begin(self.args, self.state, self.control)
   1033 
-> 1034             for step, inputs in enumerate(epoch_iterator):
   1035 
   1036                 # Skip past any already trained steps if resuming training

/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in __next__(self)
    515             if self._sampler_iter is None:
    516                 self._reset()
--> 517             data = self._next_data()
    518             self._num_yielded += 1
    519             if self._dataset_kind == _DatasetKind.Iterable and \

/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in _next_data(self)
    555     def _next_data(self):
    556         index = self._next_index()  # may raise StopIteration
--> 557         data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    558         if self._pin_memory:
    559             data = _utils.pin_memory.pin_memory(data)

/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
     42     def fetch(self, possibly_batched_index):
     43         if self.auto_collation:
---> 44             data = [self.dataset[idx] for idx in possibly_batched_index]
     45         else:
     46             data = self.dataset[possibly_batched_index]

/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py in <listcomp>(.0)
     42     def fetch(self, possibly_batched_index):
     43         if self.auto_collation:
---> 44             data = [self.dataset[idx] for idx in possibly_batched_index]
     45         else:
     46             data = self.dataset[possibly_batched_index]

<ipython-input-53-5e4959dcf50c> in __getitem__(self, idx)
      7 
      8     def __getitem__(self, idx):
----> 9         item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
     10         item['labels'] = torch.tensor(self.labels[idx])
     11         return item

<ipython-input-53-5e4959dcf50c> in <dictcomp>(.0)
      7 
      8     def __getitem__(self, idx):
----> 9         item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
     10         item['labels'] = torch.tensor(self.labels[idx])
     11         return item

IndexError: list index out of range

Evidently, it’s a problem with my tokenization. but I can’t it - for training the LM, I ensured length argument is set for tokenizer:

tokenizer = LongformerTokenizerFast.from_pretrained("./ny_model", max_len=3500)

with a hefty 52000 vocab size. next, when fine-tuning:

train_encodings = tokenizer(list(train_text), truncation=True, padding=True, max_length=3500)
val_encodings = tokenizer(list(val_text), truncation=True, padding=True, max_length=3500)

you can see I truncate the sequences. I tried with some dummy data (ensuring they are of equal length), same problem.

So what could the problem be? Any ideas?

Note that I am fine-tuning the model after uploading the LM on Huggingface.
Also, I have attached the code required to train the LM:------
Google Colab

annahaz · September 20, 2021, 4:53pm

have you figured out the reason? getting the same error here

Neel-Gupta · September 20, 2021, 6:00pm

I don’t remember how I fixed it, but most probably it was something on your data side

BramVanroy · September 20, 2021, 6:32pm

Seems from the snippet that you have less labels than input, or that your labels are empty.

prit-sk · March 2, 2023, 2:03pm

can anyone help me out pls, getting the same error

sitwala · September 19, 2023, 10:47am

Has this been figured out?

Sandy1857 · October 2, 2023, 4:03pm

Use padding=‘max_length’ or ‘longest’ instead of True.

Okihnjo · February 23, 2025, 4:05pm

I had a similar issue with the exact same error where generate() didn’t work. I modified the input_ids and therefore also the attention_mask, which lead to them both having the wrong dimensions! I used …[input_ids].unsequeeze(0) on both, which then worked.

Topic		Replies	Views
Not able to predict using Transformers Trainer class Intermediate	2	189	October 2, 2024
[HELP] How to fix IndexError: index out of range in self Beginners	1	1570	March 31, 2023
Error when Fine-tuning pretrained Masked Language Model 🤗Transformers	12	7882	March 9, 2023
IndexError list out of range Beginners	2	3123	September 21, 2020
Fine-tune transformers for language model Beginners	2	665	August 14, 2022

Getting IndexError: list index out of range when fine-tuning

Related topics