Help with merging LoRA to base model

Hi, i have fine tuned llama 8b instruct model by QLoRA (loaded base model in 4bit followed by Lora) and saved trained model. im using google colab T4(15gb) gpu. when i load base model completely on gpu (device_map=“cuda:0”) in float 16 for merging then i get Out Of Memory error
if i use following code i get NotImplementedError: Cannot copy out of meta tensor; no data!
Is there any solution?

`tokenizer = AutoTokenizer.from_pretrained(NEW_MODEL)

model = AutoModelForCausalLM.from_pretrained(
MODEL_NAME,
torch_dtype=torch.float16,
device_map=“auto”,
offload_folder=OFFLOAD_DIR,
token = HF_TOKEN
)

model.resize_token_embeddings(len(tokenizer), pad_to_multiple_of=8)
model = PeftModel.from_pretrained(model,NEW_MODEL,offload_folder=OFFLOAD_DIR)
model = model.merge_and_unload()`

1 Like

When using bitsandbytes, I think it is better not to use “auto” (or rather, accelerate) to avoid errors.

#device_map=“auto”,