Loading adapter merged models

Hello! I have a question regarding the speed of loading adapter merged models from Huggingface.

I have created a model based on decapoda-research/llama-7b-hf, found at samhog/psychology-alpaca-merged · Hugging Face.

When I load my model into my colab workspace, it takes about 10x longer than loading the base model (dowload speeds are at approx. 5-10 MB/s when loading the fine-tuned model instead of 400-600 MB/s when loading the base model).

I use

    r=8, #16,
    lora_alpha=16, #32,
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM",
)
model = AutoModelForCausalLMWithValueHead.from_pretrained(
    config.model_name,
    load_in_8bit=True,
    device_map={"": current_device},
    # device_map="auto",
    peft_config=lora_config,
    # layer_norm_names=[],
    # torch_dtype=torch.float16,
)```

``config`` is a PPOConfig object with ``model_name="samhog/psychology-alpaca-merged"``.

My question is, does anyone know why the ``decapoda-research/llama-7b-hf`` is so much faster to load? 

If needed, here is the script I used to merge the peft adapters with the weights from the base model: https://github.com/Jkhedri/Alpaca-LoRA-RLHF-PyTorch/blob/main/tuning_lm_with_rl.py