Hi everyone (noob, here)! I am currently trying to finetune llama 3 using QLORA and wish to do that on two GPUs in parallel. I have tried in vain to do this using torchrun and seem to always get the following error:
ValueError: You can’t train a model that has been loaded in 8-bit precision on a different device than the one you’re training on. Make sure you loaded the model on the correct device using for example `device_map={‘’:torch.cuda.current_device() or device_map={‘’:torch.xpu.current_device()}
I have, of course, changed the device mapping to the current cuda device but it was not really helpful as I still got the exact same error.
Any help or resources on the matter would be greatly appreciated!