Run_backward: expected dtype Float but got dtype Long

hkamineni · December 30, 2023, 6:57pm

Hi ,

I am getting Expected float error on one of my models using transformers. This is coming at the line 199 Variable._execution_engine.run_backward( in torch/autograd package.

Below is the complete stack trace of the error.

File /opt/conda/lib/python3.10/site-packages/transformers/trainer.py:2745, in Trainer.training_step(self, model, inputs)
2743 else:
2744 logger.info(f"loss.dtype={loss.dtype} , loss={loss}.")
→ 2745 self.accelerator.backward(loss)
2747 return loss.detach() / self.args.gradient_accumulation_steps

File /opt/conda/lib/python3.10/site-packages/accelerate/accelerator.py:1910, in Accelerator.backward(self, loss, **kwargs)
1908 print(f"acc: loss.dtype={loss.dtype} , loss={loss}.“)
1909 logger.info(f"acc: loss.dtype={loss.dtype} , loss={loss}.”)
→ 1910 loss.backward(**kwargs)

File /opt/conda/lib/python3.10/site-packages/torch/_tensor.py:489, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
479 return handle_torch_function(
480 Tensor.backward,
481 (self,),
(…)
486 inputs=inputs,
487 )
488 print(f"dtype={self.dtype} self={self} gradient={gradient} .")
→ 489 torch.autograd.backward(
490 self, gradient, retain_graph, create_graph, inputs=inputs
491 )

File /opt/conda/lib/python3.10/site-packages/torch/autograd/init.py:199, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
197 print(f"tensors={tensors} , 0_dtype={tensors[0].dtype} .“)
198 print(f"grad_tensors_={grad_tensors_} , 0_dtype={grad_tensors_[0].dtype} .”)
→ 199 Variable.execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
200 tensors, grad_tensors, retain_graph, create_graph, inputs,
201 allow_unreachable=True, accumulate_grad=True)

I did some debugging and logged the values of tensor at each of the stack trace call.

loss.dtype=torch.float32 , loss=0.006293008103966713`. ( in trainer.py )

acc: loss.dtype=torch.float32 , loss=0.006293008103966713. (in accelerator.py)
dtype=torch.float32 self=0.006293008103966713 gradient=None . ( in torch/_tensor.py)
The 2 lines below at last stack trace in torch/autograd/init.py
tensors=(tensor(0.0063, device='cuda:0', grad_fn=<DivBackward0>),) , 0_dtype=torch.float32 .
grad_tensors_=(tensor(1., device='cuda:0'),) , 0_dtype=torch.float32 .

The line numbers above may be slightly off as I added print or logging statements before the function call.

As you can see the tensor value is always float but I am getting type error. I am puzzled by the error. What variable is it expecting to be float that is long?

Thanks for the help in advance.

suhaaspk · February 4, 2024, 7:28pm

Hi, I am getting the same error. Did you find out what the issue is?

hkamineni · February 5, 2024, 4:17am

Yes, the error was that we only had one class in the labels, and the model could not be trained to distinguish between multiple classes when we only inputted one class in the data. Thus, it instead threw this error.

InderV94 · July 2, 2024, 6:43am

I am kind of ending up with same error , can you help me to understand what exactly caused your error?

hkamineni · July 3, 2024, 12:36am

My model was a classifier model, but the data had only one label so the model could not be trained to distinguish between multiple categories as there was only one label.

Topic		Replies	Views
Training Fails with RuntimeError related to wrong data type Beginners	1	1462	May 6, 2022
RuntimeError: Tensors of the same index must be on the same device and the same dtype except `step` tensors that can be CPU and float32 notwithstanding 🤗Datasets	0	379	May 26, 2024
Setting up a timeseries transformer 🤗Transformers	3	1247	February 8, 2024
RuntimeError when training: Expected floating point type for target with class probabilities, got Long Beginners	0	702	December 17, 2023
"too many values to unpack (expected 4)" but pixel_values dimension is correct 🤗Transformers	2	404	February 14, 2024

Run_backward: expected dtype Float but got dtype Long

Related topics