LoRA Finetuning RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!

UPDATE: at least for now the problem seems to be fixed. I downgraded the transformers library to version 4.49.0, used the transfomers.Trainer instead of the SFTTrainer and modified the loading of the model to the following.

# Load bitsandbytes config
bnb_config = BitsAndBytesConfig(load_in_4bit=True, bnb_4bit_quant_type="nf4",
                                bnb_4bit_compute_dtype="float16", bnb_4bit_use_double_quant=False)

# Load LoRA configuration
peft_config = LoraConfig(
    r=64, lora_alpha=16, lora_dropout=0, task_type="CAUSAL_LM", target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"])

# Load the model and prepare it for peft finetuning
model_name = "meta-llama/Llama-3.1-8B-Instruct"
model = AutoModelForCausalLM.from_pretrained(
    model_name, quantization_config=bnb_config, device_map="auto")

model = prepare_model_for_kbit_training(model)
model = get_peft_model(model, peft_config)

Maybe this will help someone in the future!

1 Like