Hi,
I tried to finetune ‘bert-base-cased’ on COLA task on GLUE dataset.
The training was OK, but I got the following problem when I performed prediction on ‘test’ split using the following code:
predictions = trainer.predict(test_split)
I got the following errors:
…/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [0,0,0] Assertion
t >= 0 && t < n_classes
failed.
…/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [1,0,0] Assertiont >= 0 && t < n_classes
failed.
…/aten/src/ATen/native/cuda/Loss.cu:257: nll_loss_forward_reduce_cuda_kernel_2d: block: [0,0,0], thread: [2,0,0] Assertiont >= 0 && t < n_classes
failed.
…
RuntimeError: CUDA error: device-side assert triggered
If I changed the prediction to ‘validation’ split, everything will be OK.
The questions are
- Is this a known bug?
- Are there a good method to check where this problem come from?
- I have seen that for this dataset (GLUE, COLA), the ‘label’ provided in ‘test’ split differs from the label in ‘train’ and ‘validation’ splits. Do I need to modify these labels before evaluating the model?
Thanks.