Prediction on GLUE, COLA dataset

Hi,

I tried to finetune ‘bert-base-cased’ on COLA task on GLUE dataset.
The training was OK, but I got the following problem when I performed prediction on ‘test’ split using the following code:

predictions = trainer.predict(test_split)

I got the following errors:

…/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [0,0,0] Assertion t >= 0 && t < n_classes failed.
…/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [1,0,0] Assertion t >= 0 && t < n_classes failed.
…/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [2,0,0] Assertion t >= 0 && t < n_classes failed.

RuntimeError: CUDA error: device-side assert triggered

If I changed the prediction to ‘validation’ split, everything will be OK.
The questions are

  1. Is this a known bug?
  2. Are there a good method to check where this problem come from?
  3. I have seen that for this dataset (GLUE, COLA), the ‘label’ provided in ‘test’ split differs from the label in ‘train’ and ‘validation’ splits. Do I need to modify these labels before evaluating the model?

Thanks.