How to load the finetuned model (merged weights) on colab?

I have finetuned the llama2 model. Reloaded the base model and merged the LoRA weights. I again saved this finally loaded model and now I intend to run it.


base_model = AutoModelForCausalLM.from_pretrained(
    model_name,
    low_cpu_mem_usage=True,
    return_dict=True,
    torch_dtype=torch.float16,
    device_map=device_map,
)
model = PeftModel.from_pretrained(base_model, new_model)
model = model.merge_and_unload()
model.save_pretrained(...path/to/model)

Now, I would like to the model at path/to/model using the following code

model_config = transformers.AutoConfig.from_pretrained(
    model_id,
    use_auth_token=hf_auth
)

model = transformers.AutoModelForCausalLM.from_pretrained(
    model_id,
    trust_remote_code=True,
    config=model_config,
    device_map='auto',
    offload_folder="offload",
    torch_dtype=float16,
    use_auth_token=hf_auth,
    offload_state_dict = True,
)
model.eval()

My intent behind saving the merged model is to eliminate the dependency on base_model.

problem

While running the model in the colab, i see there is no GPU usage and CPU is being used only. This crashes the runtime. I would like to know what is causing GPU to not being used?