`merge_and_unload` moves some layers in CPU

gsakkas · June 29, 2024, 6:25am

Hi all,

I’m using the following code to generate Haskell code with CodeLlama. When I use the original pre-trained CodeLlama 7B, my code runs fine. Then I fine-tuned it on my dataset and when I add the PeftModel line it works fine. But when I add merge_and_unload(), like I did for some other fine-tuned models (such as StarCoder), I get an Expected a cuda device, but got: cpu error message during inference and I see that some layers are moved to CPU. Why is this happening? Does the code behave the same without merge_and_unload(), but is just loads the Lora adapter every single time?

        tokenizer = AutoTokenizer.from_pretrained(
            "codellama/CodeLlama-7b-hf",
            cache_dir=cdir
        )
        tokenizer.pad_token = tokenizer.eos_token
        tokenizer.pad_token_id = tokenizer.eos_token_id
        tokenizer.padding_side = "left"

        quant_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_use_double_quant=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_compute_dtype=torch.float16
        )

        model = AutoModelForCausalLM.from_pretrained(
            "codellama/CodeLlama-7b-hf",
            device_map="auto",
            torch_dtype=torch.float16,
            quantization_config=quant_config,
            low_cpu_mem_usage=True,
            cache_dir=cdir
        )

        model = PeftModel.from_pretrained(model, "path_to_finetuned_checkpoint")
        # model = model.merge_and_unload()

        self.tokenizer = tokenizer
        self.model = model

Topic		Replies	Views
Having trouble loading a fine-tuned PEFT model (CodeLlama-13b-Instruct-hf base) 🤗Transformers	2	4307	October 6, 2024
Difference between AutoModelForCausalLM and peft_model.merge_and_unload() for a LoRA model during inference 🤗Transformers	2	1315	August 2, 2024
Error. Model cannot be quantized if a LoRA adapter has been applied to it via merge_and_unload() Beginners	0	297	May 12, 2024
Help with merging LoRA weights back into base model :-) Beginners	11	65497	February 6, 2025
How to load the finetuned model (merged weights) on colab? 🤗Transformers	1	1491	November 27, 2023

`merge_and_unload` moves some layers in CPU

Related topics