I’m not familiar with the internal of Funnel sequence classification models. What I only know is it gradually reduces sequence lengths and uses the saved FLOPS for a deeper/wider model.

My config is like the following.

```
config = FunnelConfig(
block_sizes=[3, 3, 3],
d_model=256,
n_head=4,
d_inner=512,
separate_cls=False,
vocab_size=trained_tokenizer.vocab_size,
)
```

When I forward a batch of sequences of length 658, I got the following error:

`RuntimeError: The size of tensor a (658) must match the size of tensor b (657) at non-singleton dimension 3`

.

The error happens here: I got `content_score`

of length 658, but `positional_attn`

of length 657.

However, the model works if I truncate the sequence to 657 tokens. (I also tried several other even numbers of length, none of them worked.)

What the problem can be?

thanks