Number of dims don't match in permute in BERT

I’m facing a peculiar issue. I’m running evaluation on HANS. I have used this script more than a hundred times at this point but this error never appeared before.

When I use, I’m getting this error in transpose_for_scores, RuntimeError: number of dims don't match in permute. I’ve inspected the dimension of input, it is torch.Size[128]). It is same as MNLI data sample from run_glue which works. I tried with older versions of transformers, and get the same error.

I tried with other models such as roberta-base and it works. I am not sure why this problem occurs because the attention weight vector has 4 dimensions [# bs, n_heads, seq_length, 64]. Can anyone suggest a workaround to fix this ?

1 Like