GPU error on LoRA for token classification

I tried to follow the guidelines from It worked fine on my single GPU PC, but when I tried to run the code on my multi-GPU PC, I encountered a CUDA error as shown below:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument index in method wrapper_CUDA__index_select)

I have already checked all of the model layers assigned to cuda:0 .
I’m wondering if the pert module does not support multi-GPUs with the Trainer module, or if there is a way to fix this issue. Thank you.

I have the exact same problem as the OP. If I remove LoRA, the code works (but runs out of memory). With LoRA, I get that error message when using HF Trainer on a single machine with multiple GPUs.

I have a similar issue when using LoRA on T5. I am using the Trainer class with a single machine on multiple GPUs.