Can't load fine tuned LLamav2 7b

I fined tuned a Llama 2 7b model and uploaded it to hugging face but now when will load it in google colab I ran out of system ram. (fine tuned model: Stoemb/llama-2-7b-html2text)

I loaded the model as followed:
model_name = “Stoemb/llama-2-7b-html2text”
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type=“nf4”,
bnb_4bit_compute_dtype=torch.bfloat16,
)

model = AutoModelForCausalLM.from_pretrained(
model_name,
quantization_config=bnb_config,
trust_remote_code=True
)
model.config.use_cache = False

I’m still learning myself but i have been playing with a Llama 2 7B model in free Colab and I’ve found i need to set the device_map to “auto” as well as loading in 4 or 8 bit.