Attention type 'block_sparse' is not possible if sequence_length: 458 <= num global tokens:

I am using the pre-trained google/bigbird-pegasus-large-arxiv model.

But I receive the following update during the forward pass.

Attention type 'block_sparse' is not possible if sequence_length: 458 <= num global tokens: 2 * config.block_size + min. num sliding tokens: 3 * config.block_size + config.num_random_blocks * config.block_size + additional buffer: config.num_random_blocks * config.block_size = 704 with config.block_size = 64, config.num_random_blocks = 3.Changing attention type to 'original_full'...

I understand the update and I am aware of benefit of time and memory it saves while using block_sparse than original_full.

So, how should I go about selecting the suitable block_size and num_random_blocks when I know that there is a lot of variation in the sequence length of my inputs?

I think one workaround is to pad your sequence length to a fixed number. i.e., >= 512/1024/2048 etc

That’s very inefficient since the seq length varies from 400-10k.

truncation=True in tokenizer works.

1 Like

I have encountered the same problem. Do you have a solution now? If you ignore this, will it have an impact

truncation=True worked for me, did you try it at all?