BlenderBot forward method crashing

0xrushi · May 26, 2022, 8:14pm

I am trying to use BlenderbotForConditionalGeneration, and I’m getting the following error

---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-51-b8db07c0c647> in <module>()
      2     input_ids = encoding['input_ids'],
      3     attention_mask = encoding['attention_mask'],
----> 4     labels=labels)

9 frames
/usr/local/lib/python3.7/dist-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
   2181         # remove once script supports set_grad_enabled
   2182         _no_grad_embedding_renorm_(weight, input, max_norm, norm_type)
-> 2183     return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
   2184 
   2185 

IndexError: index out of range in self

Actual Code


MODEL_NAME = "facebook/blenderbot-400M-distill"
tokenizer = BlenderbotTokenizer.from_pretrained(MODEL_NAME)

encoding = tokenizer(
    sample_question['question'],
    sample_question['context'],
    max_length=1024,
    padding='max_length',
    truncation="only_second",
    return_attention_mask=True,
    add_special_tokens=True,
    return_tensors="pt"
)

answer_encoding = tokenizer(
    sample_question['answer_text'],
    max_length=1024,
    padding='max_length',
    truncation=True,
    return_attention_mask=True,
    add_special_tokens=True,
    return_tensors="pt"
)
labels = answer_encoding["input_ids"]


model = BlenderbotForConditionalGeneration.from_pretrained(MODEL_NAME, return_dict = True)

output = model(
    input_ids = encoding['input_ids'],
    attention_mask = encoding['attention_mask'],
    labels = labels
)

thies · May 27, 2022, 12:50pm

As it crashes during the embedding lookup, did you check that the vocabulary file is the correct one?

0xrushi · May 27, 2022, 2:27pm

Hi @thies!,

Thanks for checking. I got it resolved over the discord, The issue was in the tokenizer max_length was wrong. The best way to find this would be checking model.config.max_position_embeddings.

tokenizer(
      data_row['question'],
      data_row['context'],
      max_length=128
      )

0xrushi · May 27, 2022, 2:41pm

If anyone is interested in the notebook link: here.
I hardcoded max_len to 128, by looking at the model.config.

Topic		Replies	Views
LongBlender embedding positions mismatch 🤗Transformers	0	524	April 19, 2021
How to finefune blenderbot model? Beginners	1	1525	May 20, 2022
[Blenderbot] Getting runtime error while using generate 🤗Transformers	3	3761	August 8, 2023
Dimension issue when fine tuning blenderbot Beginners	1	1028	October 17, 2022
I keep getting "index out of range in self" during forward pass Beginners	1	475	January 30, 2024

BlenderBot forward method crashing

Related topics