Error when Fine-tuning pretrained Masked Language Model

My whole question is here:- python - TypeError: zeros_like(): argument 'input' when fine-tuning on MLM - Stack Overflow

Basically, I am having this error when fine-tuning my pretrained model:

ValueError: expected sequence of length 2033 at dim 1 (got 2036)

Anyone have any idea how I can solve this?

Anyone :disappointed: ? I have put padding=True, so such issues should not exist

tokenizer(batch_sentences, padding=True) - padding to max sequence in batch
Maybe you wanted to use:
tokenizer(batch_sentences, padding='max_length') - padding to max model input length

This was taken from the docs:

Thanks a lot for the reply @Maimonator !! I had missed putting the max_length arg :hugs:
I am getting this new error:-

ValueError: expected sequence of length 2000 at dim 1 (got 1981)

Simply put - my tokenization just function doesn’t work :frowning: Can you see the code I posted on the StackOverflow link for my tok function and how I use the dataset.map to apply it to my dataset?

I personally can’t figure out why it doesn’t work

Is this the latest tokenizer function?

def tok(example):
  encodings = tokenizer(example['src'], truncation=True, padding=True)
  return encodings

Try this instead:

def tok(example):
  encodings = tokenizer(example['src'], truncation=True, padding="max_length", max_length=2000)
  return encodings

Let me know if this works for you

3 Likes

I tried with the different function, but

---------------------------------------------------------------------------

IndexError                                Traceback (most recent call last)

<ipython-input-158-6068ea33d5d4> in <module>()
     45     )
     46 
---> 47 train_results = trainer.train()

11 frames

/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   1914         # remove once script supports set_grad_enabled
   1915         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 1916     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
   1917 
   1918 

IndexError: index out of range in self

Which means there is no truncation/padding being done most probably?

Hmm…I removed those arguments completely to see the new error message - which explains that it indeed does truncate and pad to the model’s max input length. So apparently this error is indeed another disconnected one.

Maybe this function is also missing?
Here’s how I use it:
model.resize_token_embeddings(len(tokenizer))

Really appreciate your replies in helping me figure out this weird problem :+1: but this gets the same error as posted above index out of range in self.

This seems to be a pretty pesky and weird issue :disappointed: Wish there were more comprehensive examples on simple modelling with HF rather than the Squad and official tasks explored in these examples.

I hear you…
I feel the examples and documentations aren’t as elaborate as we would’ve wished they’d be.

Also I didn’t mention this explicitly, but I’ve set max_length=2000 in this tokenization function:

def tok(example):
  encodings = tokenizer(example['src'], truncation=True, padding="max_length", max_length=2000)
  return encodings

But you should set it to whatever you think is legit.
I don’t have any new ideas as I’m quite new to this library as well, but update us on the development!
Hopefully you’ll solve it soon :blush:

4 Likes

day 100 of reporting, still getting this error :frowning:

---------------------------------------------------------------------------

IndexError                                Traceback (most recent call last)

<ipython-input-38-dda642f3d8b6> in <module>()
     47     )
     48 
---> 49 train_results = trainer.train()

11 frames

/usr/local/lib/python3.7/dist-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, **kwargs)
   1118                         tr_loss += self.training_step(model, inputs)
   1119                 else:
-> 1120                     tr_loss += self.training_step(model, inputs)
   1121                 self._total_flos += float(self.floating_point_ops(inputs))
   1122 

/usr/local/lib/python3.7/dist-packages/transformers/trainer.py in training_step(self, model, inputs)
   1522                 loss = self.compute_loss(model, inputs)
   1523         else:
-> 1524             loss = self.compute_loss(model, inputs)
   1525 
   1526         if self.args.n_gpu > 1:

/usr/local/lib/python3.7/dist-packages/transformers/trainer.py in compute_loss(self, model, inputs, return_outputs)
   1554         else:
   1555             labels = None
-> 1556         outputs = model(**inputs)
   1557         # Save past state if it exists
   1558         # TODO: this needs to be fixed and made cleaner later.

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

/usr/local/lib/python3.7/dist-packages/transformers/models/longformer/modeling_longformer.py in forward(self, input_ids, attention_mask, global_attention_mask, head_mask, token_type_ids, position_ids, inputs_embeds, labels, output_attentions, output_hidden_states, return_dict)
   1855             output_attentions=output_attentions,
   1856             output_hidden_states=output_hidden_states,
-> 1857             return_dict=return_dict,
   1858         )
   1859         sequence_output = outputs[0]

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

/usr/local/lib/python3.7/dist-packages/transformers/models/longformer/modeling_longformer.py in forward(self, input_ids, attention_mask, global_attention_mask, head_mask, token_type_ids, position_ids, inputs_embeds, output_attentions, output_hidden_states, return_dict)
   1662 
   1663         embedding_output = self.embeddings(
-> 1664             input_ids=input_ids, position_ids=position_ids, token_type_ids=token_type_ids, inputs_embeds=inputs_embeds
   1665         )
   1666 

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

/usr/local/lib/python3.7/dist-packages/transformers/models/longformer/modeling_longformer.py in forward(self, input_ids, token_type_ids, position_ids, inputs_embeds)
    491         if inputs_embeds is None:
    492             inputs_embeds = self.word_embeddings(input_ids)
--> 493         position_embeddings = self.position_embeddings(position_ids)
    494         token_type_embeddings = self.token_type_embeddings(token_type_ids)
    495 

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

/usr/local/lib/python3.7/dist-packages/torch/nn/modules/sparse.py in forward(self, input)
    156         return F.embedding(
    157             input, self.weight, self.padding_idx, self.max_norm,
--> 158             self.norm_type, self.scale_grad_by_freq, self.sparse)
    159 
    160     def extra_repr(self) -> str:

/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   1914         # remove once script supports set_grad_enabled
   1915         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 1916     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
   1917 
   1918 

IndexError: index out of range in self

Still trying to get out of this…

This worked for me using the length in the error message I was getting, thanks Maimonator!

seems like this one fixed my problem! thanks

Hi @AlexKay How did you fix this issue?