THX! your solution works.
Did you figure out a similar solution for llama2 7b?
I have already finetune a Llama2 model on a QA datastet a below is the code snippet for model loading for fine-tuning where I used device_map as âautoâ. I have 2 GPUs and I am utilizing both during the finetuning
# Load the fine-tuned Llama-2 model
base_model = "meta-llama/Llama-2-7b-chat-hf"
compute_dtype = getattr(torch, "float32") quant_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=False, ) llma2_model = AutoModelForCausalLM.from_pretrained( base_model, quantization_config=quant_config, device_map="auto", load_in_4bit=True, ) llma2_model.config.use_cache = False llma2_model.config.pretraining_tp = 1
# Load the Llama-2 tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True, fast=True) tokenizer.pad_token = tokenizer.eos_token tokenizer.padding_side = "right"
I was saving model checkpoints during the process. Now I want to load a model checkpoint and further train it. However, when I try to load my model checkpoint in the same way I did with the hugging face llama2 7B model it generates the following error when the model is starting the training process.
ERROR:
`"name": "RuntimeError",`
`"message": "Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu! (when checking argument for argument mat2 in method wrapper_CUDA_mm)",`
Code used for loading the checkpoint:
base_model="/home/LLAMA2/checkpoints/checkP4/checkpoint-160000"
compute_dtype = getattr(torch, "float32") quant_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=False, ) llma2_model = AutoModelForCausalLM.from_pretrained( base_model, quantization_config=quant_config, device_map="auto", load_in_4bit=True, ) llma2_model.config.use_cache = False llma2_model.config.pretraining_tp = 1
# Load the Llama-2 tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model, trust_remote_code=True, fast=True) tokenizer.pad_token = tokenizer.eos_token tokenizer.padding_side = "right"
I have tried moving to device. using the below code
import torch
device = torch.device("cuda" if torch.cuda.is_available() else "cpu") llma2_model.to(device)
However it gives error since I quantize the model.
`You shouldn't move a model when it is dispatched on multiple devices.Using device: cuda`
`ValueError: \`.to` is not supported for `4-bit` or `8-bit` bitsandbytes models. Please use the model as it is, since the model has already been set to the correct devices and casted to the correct `dtype`.`
Here is a sample of model parameters and its locations as you can see its on both devices GPU 0 and 1
model.layers.12.self_attn.k_proj.weight is on cuda:0 model.layers.12.self_attn.v_proj.lora_A.default.weight is on cuda:0 model.layers.12.self_attn.v_proj.lora_B.default.weight is on cuda:0 model.layers.12.self_attn.v_proj.base_layer.weight is on cuda:0 model.layers.12.self_attn.o_proj.weight is on cuda:0 model.layers.12.mlp.gate_proj.weight is on cuda:0 model.layers.12.mlp.up_proj.weight is on cuda:0 model.layers.12.mlp.down_proj.weight is on cuda:0 model.layers.12.input_layernorm.weight is on cuda:0 model.layers.12.post_attention_layernorm.weight is on cuda:0 model.layers.13.self_attn.q_proj.lora_A.default.weight is on cuda:1 model.layers.13.self_attn.q_proj.lora_B.default.weight is on cuda:1 model.layers.13.self_attn.q_proj.base_layer.weight is on cuda:1 model.layers.13.self_attn.k_proj.weight is on cuda:1
Can someone provide me a solution to this Issue. All I want is to load my saved checkpoint and proceed with further finetuning. but couldnât find a solution. Really glad if someone can offer a solution.
try to specify the cuda device:
device = âcuda:0â
Issue is I use a quantized model which doesnt allow me to use to device.
Furthermore, what Iam trying to do is further fine-tunining from a already saved checkpoint.
Iâm afraid Iâm having the same problem. Mistral 7B Instruct, 2 16G GPUs, BNB 4-bit quantization, runs fine on the first GPU, crashes out with this error on the input dispatched to the second GPU. At this point I am frustrated and running it on only one GPU. FAR less than optimal . . .
I met the same problem. have u got the idea to solve that one? thank you.
This is indeed very good
for i in model.named_parameters():
print(f"{i[0]} -> {i[1].device}")
Thanks to this I was able to find where my embeddigs were!
TripletEmbedding(
(embedding): Embedding(2576744, 512)
)
embedding.weight -> cuda:0
@loretoparisi Did this solve your issue? What did you change after seeing that output?
Iâm also getting this error.