Llama 3 peft ddp

ameljelidi · May 30, 2024, 6:08pm

Hi everyone (noob, here)! I am currently trying to finetune llama 3 using QLORA and wish to do that on two GPUs in parallel. I have tried in vain to do this using torchrun and seem to always get the following error:

ValueError: You can’t train a model that has been loaded in 8-bit precision on a different device than the one you’re training on. Make sure you loaded the model on the correct device using for example `device_map={‘’:torch.cuda.current_device() or device_map={‘’:torch.xpu.current_device()}

I have, of course, changed the device mapping to the current cuda device but it was not really helpful as I still got the exact same error.

Any help or resources on the matter would be greatly appreciated!

nielsr · June 2, 2024, 12:59pm

Hi,

I’d recommend this guide: Efficiently fine-tune Llama 3 with PyTorch FSDP and Q-Lora. It goes over fine-tuning Llama-3 using Q-LoRa using FSDP, which is PyTorch’s successor of DDP.

There’s also the Alignment Handbook, which includes scripts for both full fine-tuning and Q-LoRa on both single and multi-GPU environments: alignment-handbook/scripts at main · huggingface/alignment-handbook · GitHub.

conacts · August 20, 2024, 4:06pm

I wrote an article describing how to PEFT finetune llama using llama recipes, which is basically a super simple library for finetuning, infrence, etc. made by FAIR.

Topic		Replies	Views
Llama2 fine-tunning with PEFT QLora and testing the model 🤗Transformers	13	15292	December 21, 2023
Multi-gpu inference llama-3.2 vision with QLoRA 🤗Accelerate	4	116	April 25, 2025
Training llama with Lora on multiple GPUs may exist bug 🤗Transformers	10	9566	August 25, 2023
DDP error for LoRA SFT 🤗Transformers	1	188	December 5, 2024
LoRA Finetuning RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! 🤗Transformers	4	56	June 16, 2025

Llama 3 peft ddp

Related topics