I’m trying to follow the instructions in this page:
and I encounter an error when I try to train:
RuntimeError: "nll_loss_forward_reduce_cuda_kernel_2d_index" not implemented for 'Int'
the error is in this function:
return torch._C._nn.cross_entropy_loss(input, target, weight, _Reduction.get_enum(reduction), ignore_index, label_smoothing
I tried to cast the label to ‘int64’ and got the same error (the cast works, when I try to change to float32 I see that the error is changed to float)
I’m running with
windows server 2022
cuda 11.7 (the gpu is a10)
I’ve found a workaround, I installed WSL and run everything from there, it works
I’ve ran into the same issue in a similar environment (W11). In my case, even switching to CPU (setting the TrainingArguments parameter no_cuda to True) resulted in an error like this. In the Trainer, using the datasets with_format(‘torch’) made it work with and without CUDA enabled. Example below:
trainer = Trainer(
Hope this helps.