Llama 3 peft ddp

Hi everyone (noob, here)! I am currently trying to finetune llama 3 using QLORA and wish to do that on two GPUs in parallel. I have tried in vain to do this using torchrun and seem to always get the following error:

ValueError: You can’t train a model that has been loaded in 8-bit precision on a different device than the one you’re training on. Make sure you loaded the model on the correct device using for example `device_map={‘’:torch.cuda.current_device() or device_map={‘’:torch.xpu.current_device()}

I have, of course, changed the device mapping to the current cuda device but it was not really helpful as I still got the exact same error.

Any help or resources on the matter would be greatly appreciated!


I’d recommend this guide: Efficiently fine-tune Llama 3 with PyTorch FSDP and Q-Lora. It goes over fine-tuning Llama-3 using Q-LoRa using FSDP, which is PyTorch’s successor of DDP.

There’s also the Alignment Handbook, which includes scripts for both full fine-tuning and Q-LoRa on both single and multi-GPU environments: alignment-handbook/scripts at main · huggingface/alignment-handbook · GitHub.

1 Like