Run_backward: expected dtype Float but got dtype Long

Hi ,

I am getting Expected float error on one of my models using transformers. This is coming at the line 199 Variable._execution_engine.run_backward( in torch/autograd package.

Below is the complete stack trace of the error.

File /opt/conda/lib/python3.10/site-packages/transformers/trainer.py:2745, in Trainer.training_step(self, model, inputs)
2743 else:
2744 logger.info(f"loss.dtype={loss.dtype} , loss={loss}.")
→ 2745 self.accelerator.backward(loss)
2747 return loss.detach() / self.args.gradient_accumulation_steps

File /opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py:1910, in Accelerator.backward(self, loss, **kwargs)
1908 print(f"acc: loss.dtype={loss.dtype} , loss={loss}.“)
1909 logger.info(f"acc: loss.dtype={loss.dtype} , loss={loss}.”)
→ 1910 loss.backward(**kwargs)

File /opt/conda/lib/python3.10/site-packages/torch/_tensor.py:489, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
479 return handle_torch_function(
480 Tensor.backward,
481 (self,),
(…)
486 inputs=inputs,
487 )
488 print(f"dtype={self.dtype} self={self} gradient={gradient} .")
→ 489 torch.autograd.backward(
490 self, gradient, retain_graph, create_graph, inputs=inputs
491 )

File /opt/conda/lib/python3.10/site-packages/torch/autograd/init.py:199, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
197 print(f"tensors={tensors} , 0_dtype={tensors[0].dtype} .“)
198 print(f"grad_tensors_={grad_tensors_} , 0_dtype={grad_tensors_[0].dtype} .”)
→ 199 Variable.execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
200 tensors, grad_tensors
, retain_graph, create_graph, inputs,
201 allow_unreachable=True, accumulate_grad=True)

I did some debugging and logged the values of tensor at each of the stack trace call.

loss.dtype=torch.float32 , loss=0.006293008103966713`. ( in trainer.py )

acc: loss.dtype=torch.float32 , loss=0.006293008103966713. (in accelerator.py)
dtype=torch.float32 self=0.006293008103966713 gradient=None . ( in torch/_tensor.py)
The 2 lines below at last stack trace in torch/autograd/init.py
tensors=(tensor(0.0063, device='cuda:0', grad_fn=<DivBackward0>),) , 0_dtype=torch.float32 .
grad_tensors_=(tensor(1., device='cuda:0'),) , 0_dtype=torch.float32 .

The line numbers above may be slightly off as I added print or logging statements before the function call.

As you can see the tensor value is always float but I am getting type error. I am puzzled by the error. What variable is it expecting to be float that is long?

Thanks for the help in advance.

Hi, I am getting the same error. Did you find out what the issue is?

Yes, the error was that we only had one class in the labels, and the model could not be trained to distinguish between multiple classes when we only inputted one class in the data. Thus, it instead threw this error.