Bigbirdmodel: Problem with running code provided in documentation

khmcnally · April 21, 2021, 11:58am

Hey folks, QQ: Has anyone tried running the provided code in Bigbird documentation and run into problems? I’m simply trying to embed some input using the pre-trained model for initial exploration, and I’m running into an error: IndexError: index out of range in self

Has anyone come across this error before or seen a fix for it? Thanks.
Full stack trace below:

IndexError Traceback (most recent call last)
in
5
6 inputs = tokenizer(“Hello, my dog is cute”, return_tensors=“pt”)
----> 7 outputs = model(**inputs)
8 outputs

~/SageMaker/persisted_conda_envs/intercom_kevin/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
→ 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),

~/SageMaker/persisted_conda_envs/intercom_kevin/lib/python3.6/site-packages/transformers/models/big_bird/modeling_big_bird.py in forward(self, input_ids, attention_mask, token_type_ids, position_ids, head_mask, inputs_embeds, encoder_hidden_states, encoder_attention_mask, past_key_values, use_cache, output_attentions, output_hidden_states, return_dict)
2076 token_type_ids=token_type_ids,
2077 inputs_embeds=inputs_embeds,
→ 2078 past_key_values_length=past_key_values_length,
2079 )
2080

~/SageMaker/persisted_conda_envs/intercom_kevin/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
→ 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),

~/SageMaker/persisted_conda_envs/intercom_kevin/lib/python3.6/site-packages/transformers/models/big_bird/modeling_big_bird.py in forward(self, input_ids, token_type_ids, position_ids, inputs_embeds, past_key_values_length)
283
284 if inputs_embeds is None:
→ 285 inputs_embeds = self.word_embeddings(input_ids)
286
287 if self.rescale_embeddings:

~/SageMaker/persisted_conda_envs/intercom_kevin/lib/python3.6/site-packages/torch/nn/modules/module.py in _call_impl(self, *input, **kwargs)
720 result = self._slow_forward(*input, **kwargs)
721 else:
→ 722 result = self.forward(*input, **kwargs)
723 for hook in itertools.chain(
724 _global_forward_hooks.values(),

~/SageMaker/persisted_conda_envs/intercom_kevin/lib/python3.6/site-packages/torch/nn/modules/sparse.py in forward(self, input)
124 return F.embedding(
125 input, self.weight, self.padding_idx, self.max_norm,
→ 126 self.norm_type, self.scale_grad_by_freq, self.sparse)
127
128 def extra_repr(self) → str:

~/SageMaker/persisted_conda_envs/intercom_kevin/lib/python3.6/site-packages/torch/nn/functional.py in embedding(input, weight, padding_idx, max_norm, norm_type, scale_grad_by_freq, sparse)
1812 # remove once script supports set_grad_enabled
1813 no_grad_embedding_renorm(weight, input, max_norm, norm_type)
→ 1814 return torch.embedding(weight, input, padding_idx, scale_grad_by_freq, sparse)
1815
1816

IndexError: index out of range in self

sgugger · April 21, 2021, 12:11pm

cc @vasudevgupta

vasudevgupta · April 21, 2021, 12:42pm

hi @khmcnally,

I was running this:

m = BigBirdModel.from_pretrained("google/bigbird-roberta-base")
sample = tokenizer("Hello, my dog is cute", return_tensors="pt")
m(**sample)

and it’s working for me. Are you using model in some other configuration? If so, please share that, I will try to run.

khmcnally · April 21, 2021, 1:07pm

Thanks so much for the quick reply. I’m simply running this:

from transformers import BigBirdTokenizer, BigBirdModel

tokenizer = BigBirdTokenizer.from_pretrained('google/bigbird-roberta-base')
model = BigBirdModel.from_pretrained('google/bigbird-roberta-base')

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs)
outputs

Perhaps it’s my dev configuration that’s the issue? I’m running my own conda environment with transformers v4.5.1, if that helps?

Edit: There’s also some logging provided that may be useful context:

Attention type 'block_sparse' is not possible if sequence_length: 8 <= num global tokens: 2 * config.block_size + min. num sliding tokens: 3 * config.block_size + config.num_random_blocks * config.block_size + additional buffer: config.num_random_blocks * config.block_size = 704 with config.block_size = 64, config.num_random_blocks = 3.Changing attention type to 'original_full'...

vasudevgupta · April 21, 2021, 2:01pm

So, this issue will be fixed if you work with either latest pip version or master branch.

It’s fine to have warning which you mentioned. Since your sequence length is very small, will shift your model’s attention_type to "original_full". For using block sparse attention, you will have to input long sequence following rule as per warning.

khmcnally · April 21, 2021, 2:08pm

Great Vasudev, thanks so much for your help here! Pardon my ignorance, but do I update to latest version of transformers package or is it some other package I should update? Which version should I be using exactly? Is there a later version than 4.5.1?

vasudevgupta · April 21, 2021, 2:28pm

# To install latest version. run this 
pip3 install transformers==4.5.1

## or

# if you want to work with master branch
pip3 install git+https://github.com/huggingface/transformers/tree/master/src/transformers@master

khmcnally · April 21, 2021, 2:30pm

Thanks so much Vasudev, that worked perfectly!

khmcnally · April 22, 2021, 1:45pm

Hi @vasudevgupta (appreciate I’m cheekily @-ing you!)

The above runs fine, but the number of dimensions output are different depending on the length of the text.

For example if i use the string 'hello' and call .shape on the tensor returned, the dims are ([1,3,768])
But if I use 'hello my dog is cute', the dims are [(1,7,768)].

Perhaps this is expected behaviour? I have experience using various language models in tensorflow and they tend to generate vectors with the same numbers of dimensions. Perhaps there is something really fundamental or obvious I’m missing here?

For context, I’m exploring matching conversational type questions to knowledge base articles, and I wanted to test this language model for suitability. So I’m trying to calculate cosine similarity between embeddings of questions and those of articles.

Any help would be much appreciated! My guess is I have to use one of the hidden layers as embeddings but I’m not sure.

vasudevgupta · April 22, 2021, 2:17pm

hi, this is happening because Tokenizer is adding [CLS] & [SEP] token. Hence input sequence length is 3 not 1.

khmcnally · April 22, 2021, 3:05pm

Ah okay, so I just use the CLS token, grabbing it with something like this?
embeds[0].detach().numpy()[:,0,:]

jbmaxwell · October 22, 2021, 10:45pm

Piling onto this thread; I’m fine with the automatic change to full attention, but is there a way to suppress the warning? Also, I’m kind of assuming that “original_full” doesn’t mean that it’s fundamentally the same as a standard bidirectional model like BERT, but is that correct? (I guess I’m not totally clear on the difference between sparse attention and “block_sparse” attention.)

Topic		Replies	Views
Possible wrong BigBirdTtokenizationFast special token initialization in pretrained model Models	3	418	July 29, 2021
Bigbird pretraining Beginners	3	886	March 16, 2022
Index out of range in transformer summarization 🤗Transformers	2	131	December 16, 2024
Not able to predict using Transformers Trainer class Intermediate	2	173	October 2, 2024
I keep getting "index out of range in self" during forward pass Beginners	1	475	January 30, 2024

Bigbirdmodel: Problem with running code provided in documentation

Related topics