RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!

zian2254 · January 8, 2024, 9:54am

THX! your solution works.

Kason123 · January 12, 2024, 2:01am

Did you figure out a similar solution for llama2 7b?

Basdila · March 18, 2024, 11:27am

I have already finetune a Llama2 model on a QA datastet a below is the code snippet for model loading for fine-tuning where I used device_map as “auto”. I have 2 GPUs and I am utilizing both during the finetuning

# Load the fine-tuned Llama-2 model

base_model = "meta-llama/Llama-2-7b-chat-hf"

compute_dtype = getattr(torch, "float32") quant_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=False, ) llma2_model = AutoModelForCausalLM.from_pretrained( base_model, quantization_config=quant_config, device_map="auto", load_in_4bit=True, ) llma2_model.config.use_cache = False llma2_model.config.pretraining_tp = 1

# Load the Llama-2 tokenizer

tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True, fast=True) tokenizer.pad_token = tokenizer.eos_token tokenizer.padding_side = "right"

I was saving model checkpoints during the process. Now I want to load a model checkpoint and further train it. However, when I try to load my model checkpoint in the same way I did with the hugging face llama2 7B model it generates the following error when the model is starting the training process.

ERROR:
`"name": "RuntimeError",`
`"message": "Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm)",`

Code used for loading the checkpoint:

base_model="/home/LLAMA2/checkpoints/checkP4/checkpoint-160000"

compute_dtype = getattr(torch, "float32") quant_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=False, ) llma2_model = AutoModelForCausalLM.from_pretrained( base_model, quantization_config=quant_config, device_map="auto", load_in_4bit=True, ) llma2_model.config.use_cache = False llma2_model.config.pretraining_tp = 1

# Load the Llama-2 tokenizer

tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True, fast=True) tokenizer.pad_token = tokenizer.eos_token tokenizer.padding_side = "right"

I have tried moving to device. using the below code

import torch

device = torch.device("cuda" if torch.cuda.is_available() else "cpu") llma2_model.to(device)

However it gives error since I quantize the model.
`You shouldn't move a model when it is dispatched on multiple devices.Using device: cuda`

`ValueError: \`.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.`

Here is a sample of model parameters and its locations as you can see its on both devices GPU 0 and 1

model.layers.12.self_attn.k_proj.weight is on cuda:0 model.layers.12.self_attn.v_proj.lora_A.default.weight is on cuda:0 model.layers.12.self_attn.v_proj.lora_B.default.weight is on cuda:0 model.layers.12.self_attn.v_proj.base_layer.weight is on cuda:0 model.layers.12.self_attn.o_proj.weight is on cuda:0 model.layers.12.mlp.gate_proj.weight is on cuda:0 model.layers.12.mlp.up_proj.weight is on cuda:0 model.layers.12.mlp.down_proj.weight is on cuda:0 model.layers.12.input_layernorm.weight is on cuda:0 model.layers.12.post_attention_layernorm.weight is on cuda:0 model.layers.13.self_attn.q_proj.lora_A.default.weight is on cuda:1 model.layers.13.self_attn.q_proj.lora_B.default.weight is on cuda:1 model.layers.13.self_attn.q_proj.base_layer.weight is on cuda:1 model.layers.13.self_attn.k_proj.weight is on cuda:1

Can someone provide me a solution to this Issue. All I want is to load my saved checkpoint and proceed with further finetuning. but couldn’t find a solution. Really glad if someone can offer a solution.

Vladyslav · April 18, 2024, 11:43am

try to specify the cuda device:
device = ‘cuda:0’

Basdila · April 23, 2024, 5:19am

Issue is I use a quantized model which doesnt allow me to use to device.
Furthermore, what Iam trying to do is further fine-tunining from a already saved checkpoint.

bogolese · August 12, 2024, 4:49pm

I’m afraid I’m having the same problem. Mistral 7B Instruct, 2 16G GPUs, BNB 4-bit quantization, runs fine on the first GPU, crashes out with this error on the input dispatched to the second GPU. At this point I am frustrated and running it on only one GPU. FAR less than optimal . . .

lucazhou2000 · September 14, 2024, 1:17am

I met the same problem. have u got the idea to solve that one? thank you.

loretoparisi · September 24, 2024, 7:16pm

This is indeed very good

for i in model.named_parameters():
    print(f"{i[0]} -> {i[1].device}")

Thanks to this I was able to find where my embeddigs were!

TripletEmbedding(
  (embedding): Embedding(2576744, 512)
)
embedding.weight -> cuda:0

ruddjm · November 17, 2024, 12:47am

@loretoparisi Did this solve your issue? What did you change after seeing that output?

I’m also getting this error.

Topic		Replies	Views
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! I am on a single T4 GPU 🤗Accelerate	6	1223	June 10, 2024
Fine tune "meta-llama/Llama-2-7b-hf" Bug:RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument target in method wrapper_CUDA_nll_loss_forward) Beginners	15	206	December 6, 2024
Multi-gpu inference llama-3.2 vision with QLoRA 🤗Accelerate	4	148	April 25, 2025
Cannot launch multi-gpu training? 🤗Transformers	0	725	September 14, 2023
Training llama with Lora on multiple GPUs may exist bug 🤗Transformers	10	9701	August 25, 2023

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0!

Related topics