Attention type 'block_sparse' is not possible if sequence_length: 458 <= num global tokens:

krishnagarg09 · August 13, 2021, 11:03pm

I am using the pre-trained google/bigbird-pegasus-large-arxiv model.

But I receive the following update during the forward pass.

Attention type 'block_sparse' is not possible if sequence_length: 458 <= num global tokens: 2 * config.block_size + min. num sliding tokens: 3 * config.block_size + config.num_random_blocks * config.block_size + additional buffer: config.num_random_blocks * config.block_size = 704 with config.block_size = 64, config.num_random_blocks = 3.Changing attention type to 'original_full'...

I understand the update and I am aware of benefit of time and memory it saves while using block_sparse than original_full.

So, how should I go about selecting the suitable block_size and num_random_blocks when I know that there is a lot of variation in the sequence length of my inputs?

MattHatter · August 14, 2021, 12:20am

I think one workaround is to pad your sequence length to a fixed number. i.e., >= 512/1024/2048 etc

krishnagarg09 · August 14, 2021, 12:36am

That’s very inefficient since the seq length varies from 400-10k.

truncation=True in tokenizer works.

ccfeidao · September 6, 2021, 4:49am

I have encountered the same problem. Do you have a solution now? If you ignore this, will it have an impact

krishnagarg09 · September 6, 2021, 2:02pm

truncation=True worked for me, did you try it at all?

Topic		Replies	Views
Bigbird pretraining Beginners	3	885	March 16, 2022
Fine-tuning BigBirdPegasus Models	0	454	October 13, 2021
Customizing GenerationMixin to output attentions Beginners	4	1820	September 10, 2020
Cross Attention Probabilities in SD 1.4 of start of sentence token 🧨 Diffusers	1	491	May 29, 2023
Understanding how token batches and fine-tuning interact Beginners	0	441	March 22, 2022

Attention type 'block_sparse' is not possible if sequence_length: 458 <= num global tokens:

Related topics