Hi,
This is what I have:
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type=“nf4”,
bnb_4bit_compute_dtype=torch.bfloat16
)
model_id = “meta-llama/Llama-3.2-1B”
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map={“”:0})
print(model.get_memory_footprint())
The result is : 1012013184
why don’t I see a reduction in memory footprint ? When I use “facebook/opt-350m” I see the footprint is 207m. I have a GTX 1080 on my system. What am i missing ?
Thanks
Mohan
1 Like
I measured it again with this code. It measures both the model footprint as reported by pytorch and gpu memory allocated.
torch.cuda.reset_peak_memory_stats(device=None)
model_id = “meta-llama/Llama-3.2-1B”
model = AutoModelForCausalLM.from_pretrained(model_id, load_in_4bit=True, device_map={“”:0})
#model = AutoModelForCausalLM.from_pretrained(model_id, device_map={“”:0})
print(model.get_memory_footprint())
print(f"gpu used {torch.cuda.max_memory_allocated(device=None)}")
It was 4.9b vs 1gb (footprint and cuda memory was pretty close) between original vs 4bit. Sorry for the confusion.
-thanks
Mohan
1 Like