Loading adapter merged models

samhog · May 29, 2023, 2:20pm

Hello! I have a question regarding the speed of loading adapter merged models from Huggingface.

I have created a model based on decapoda-research/llama-7b-hf, found at samhog/psychology-alpaca-merged · Hugging Face.

When I load my model into my colab workspace, it takes about 10x longer than loading the base model (dowload speeds are at approx. 5-10 MB/s when loading the fine-tuned model instead of 400-600 MB/s when loading the base model).

I use

    r=8, #16,
    lora_alpha=16, #32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)
model = AutoModelForCausalLMWithValueHead.from_pretrained(
    config.model_name,
    load_in_8bit=True,
    device_map={"": current_device},
    # device_map="auto",
    peft_config=lora_config,
    # layer_norm_names=[],
    # torch_dtype=torch.float16,
)```

``config`` is a PPOConfig object with ``model_name="samhog/psychology-alpaca-merged"``.

My question is, does anyone know why the ``decapoda-research/llama-7b-hf`` is so much faster to load? 

If needed, here is the script I used to merge the peft adapters with the weights from the base model: https://github.com/Jkhedri/Alpaca-LoRA-RLHF-PyTorch/blob/main/tuning_lm_with_rl.py

Topic		Replies	Views
Why the model loading of llama2 is so slow? 🤗Transformers	6	9484	April 24, 2024
Issue with Deploying LoRA-adapted Model on Hugging Face Endpoint Beginners	10	109	April 26, 2025
Challenges with Uploading Merged LoRA-Enhanced Model to Hugging Face and Langchain Hub Models	0	570	July 1, 2023
How to load the finetuned model (merged weights) on colab? 🤗Transformers	1	1492	November 27, 2023
How do I merge a lora adapter back into the model weights? Beginners	1	3180	August 23, 2023

Loading adapter merged models

Related topics