Hello! I have a question regarding the speed of loading adapter merged models from Huggingface.
I have created a model based on decapoda-research/llama-7b-hf, found at samhog/psychology-alpaca-merged · Hugging Face.
When I load my model into my colab workspace, it takes about 10x longer than loading the base model (dowload speeds are at approx. 5-10 MB/s when loading the fine-tuned model instead of 400-600 MB/s when loading the base model).
I use
r=8, #16,
lora_alpha=16, #32,
lora_dropout=0.05,
bias="none",
task_type="CAUSAL_LM",
)
model = AutoModelForCausalLMWithValueHead.from_pretrained(
config.model_name,
load_in_8bit=True,
device_map={"": current_device},
# device_map="auto",
peft_config=lora_config,
# layer_norm_names=[],
# torch_dtype=torch.float16,
)```
``config`` is a PPOConfig object with ``model_name="samhog/psychology-alpaca-merged"``.
My question is, does anyone know why the ``decapoda-research/llama-7b-hf`` is so much faster to load?
If needed, here is the script I used to merge the peft adapters with the weights from the base model: https://github.com/Jkhedri/Alpaca-LoRA-RLHF-PyTorch/blob/main/tuning_lm_with_rl.py