Loss.backward() error after moving model from cpu to cuda

damaggu · May 10, 2023, 2:05am

Hi,

Would someone know how to help with this issue? Thanks!
If in the following code snipped, I change the device from CPU to Cuda it throws the following error:

import torch
from diffusers import AutoencoderKL

device = "cuda"

vae_2 = AutoencoderKL.from_pretrained("stabilityai/sd-vae-ft-mse", torch_dtype=torch.float32,)
vae_2.train()
classes = 2
vae_2.to(device)

latents = torch.randn(1, 4, 64, 64).to(torch.float32).to(device)
decoded_mask = vae_2.decode(latents, return_dict=False)[0]
decoded_mask.requires_grad_(True)
target = torch.randn(decoded_mask.shape).to(device)
loss = torch.nn.MSELoss()(decoded_mask, target)

loss.backward()

Error:

│                                                                              │
│   25                                                                         │
│   26 print(loss)                                                             │
│   27                                                                         │
│ ❱ 28 loss.backward()                                                         │
│   29                                                                         │
│   30 print(vae_2.decoder.conv_out.weight.grad)                               │
│   31 print(vae_2.decoder.conv_out.bias.grad)                                 │
│                                                                              │
│ /home/nkondapa/anaconda3/envs/neurips2023_env/lib/python3.8/site-packages/to │
│ rch/_tensor.py:487 in backward                                               │
│                                                                              │
│    484 │   │   │   │   create_graph=create_graph,                            │
│    485 │   │   │   │   inputs=inputs,                                        │
│    486 │   │   │   )                                                         │
│ ❱  487 │   │   torch.autograd.backward(                                      │
│    488 │   │   │   self, gradient, retain_graph, create_graph, inputs=inputs │
│    489 │   │   )                                                             │
│    490                                                                       │
│                                                                              │
│ /home/nkondapa/anaconda3/envs/neurips2023_env/lib/python3.8/site-packages/to │
│ rch/autograd/__init__.py:200 in backward                                     │
│                                                                              │
│   197 │   # The reason we repeat same the comment below is that              │
│   198 │   # some Python versions print out the first line of a multi-line fu │
│   199 │   # calls in the traceback and some print out the last line          │
│ ❱ 200 │   Variable._execution_engine.run_backward(  # Calls into the C++ eng │
│   201 │   │   tensors, grad_tensors_, retain_graph, create_graph, inputs,    │
│   202 │   │   allow_unreachable=True, accumulate_grad=True)  # Calls into th │
│   203                                                                        │
╰──────────────────────────────────────────────────────────────────────────────╯
RuntimeError: CUDA error: invalid argument
CUDA kernel errors might be asynchronously reported at some other API call, so
the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.

toyl · June 22, 2023, 1:15am

Did you solve it, please?

fanglixin · May 28, 2024, 7:21am

did you solve it? i miss the same

Topic		Replies	Views
Model cannot backward on GPU but can backward on CPU 🧨 Diffusers	1	196	May 28, 2024
Issue with accelerator.backward(loss) freezing 🤗Accelerate	0	530	January 6, 2024
Expected all tensors to be on the same device. Running base.to("cuda:0") and refiner.to("cuda:1") Model parallism 🧨 Diffusers	2	923	October 6, 2023
Device Type Error With Diffusers Pipeline TAT Beginners	2	1299	August 19, 2023
RuntimeError: Found no NVIDIA driver on your system Spaces	3	1220	October 11, 2022

Loss.backward() error after moving model from cpu to cuda

Related topics