Colab RAM Limit Exceeded: Unable to Run 3B Model Even with Quantization

I am currently trying the new model stabilityai/stablecode-completion-alpha-3b on a free colab notebook with gpu (12 gigabyte in system ram and 14 gigabyte in gpu ram T4)

this the code I am using :

from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("stabilityai/stablecode-completion-alpha-3b")
model = AutoModelForCausalLM.from_pretrained(
  "stabilityai/stablecode-completion-alpha-3b",
  trust_remote_code=True,
  load_in_8bit=True,
  device="cuda",
)

as soon as the model is starting to load the sys RAM starts to fill until I get the warning that the RAM has been exausted and the envirement is restarted. I don’t know why the model is being loaded to sys RAM instead of the gpu RAM, and the 12 gigabytes of RAM is used up for a quantized 3b model