GPU error on LoRA for token classification

yatip · June 1, 2023, 6:01am

I tried to follow the guidelines from https://huggingface.co/docs/peft/task_guides/token-classification-lora. It worked fine on my single GPU PC, but when I tried to run the code on my multi-GPU PC, I encountered a CUDA error as shown below:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument index in method wrapper_CUDA__index_select)

I have already checked all of the model layers assigned to cuda:0 .
I’m wondering if the pert module does not support multi-GPUs with the Trainer module, or if there is a way to fix this issue. Thank you.

tabacof · June 6, 2023, 2:43pm

I have the exact same problem as the OP. If I remove LoRA, the code works (but runs out of memory). With LoRA, I get that error message when using HF Trainer on a single machine with multiple GPUs.

tyzhu · June 19, 2023, 5:35am

I have a similar issue when using LoRA on T5. I am using the Trainer class with a single machine on multiple GPUs.

Topic		Replies	Views
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! I am on a single T4 GPU 🤗Accelerate	6	1238	June 10, 2024
Can I use CUDA with Trainer.train? Beginners	3	8018	May 10, 2022
Cannot launch multi-gpu training? 🤗Transformers	0	728	September 14, 2023
Training llama with Lora on multiple GPUs may exist bug 🤗Transformers	10	9764	August 25, 2023
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! 🤗Transformers	28	115585	November 17, 2024

GPU error on LoRA for token classification

Related topics