How to load the finetuned model (merged weights) on colab?

I have finetuned the llama2 model. Reloaded the base model and merged the LoRA weights. I again saved this finally loaded model and now I intend to run it.


base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map=device_map,
)
model = PeftModel.from_pretrained(base_model, new_model)
model = model.merge_and_unload()
model.save_pretrained(...path/to/model)

Now, I would like to the model at path/to/model using the following code

model_config = transformers.AutoConfig.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)

model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    config=model_config,
    device_map='auto',
    offload_folder="offload",
    torch_dtype=float16,
    use_auth_token=hf_auth,
    offload_state_dict = True,
)
model.eval()

My intent behind saving the merged model is to eliminate the dependency on base_model.

problem

While running the model in the colab, i see there is no GPU usage and CPU is being used only. This crashes the runtime. I would like to know what is causing GPU to not being used?

Would you try saving only the adapter? When you need, you load the base model and the adapter, then merge, then use. I think saving the merged model raises a bug.