Does loading in 4bit override an 8bit model?

cassianlewis · October 20, 2023, 10:39am

Having saved a model in 8bit:

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    load_in_8bit = True,
    device_map = 'auto')

model.save_pretrained('model_weights_test')

I then loaded it in 4bit (with a different script):

double_quant_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)
    
model = AutoModelForCausalLM.from_pretrained(
    'model_weights_test',
    quantization_config=double_quant_config,
    device_map = 'auto')

However I am then running into OOM issues I was not seeing by just initially loading it in 4bit:

model = AutoModelForCausalLM.from_pretrained(
    model_id,
    quantization_config=double_quant_config,
    device_map = 'auto')

My question is therefore: does the 4bit config fail to override the 8bit saved model? Am I essentially just training an 8bit model, hence the OOM issues?

Topic		Replies	Views
Does load_in_8bit directly load the model in 8bit? (spoliler, do not seem like it) Beginners	0	1471	July 11, 2023
Understanding how changing bnb_4bit_compute_dtype affects outputs 🤗Transformers	1	4622	February 10, 2024
Load_in_8bit vs. loading 8-bit quantized model 🤗Transformers	6	6543	May 13, 2024
BitsAndBytes transformers issue 🤗Transformers	1	2432	September 15, 2023
Deepspeed inference and infinity offload with bitsandbytes 4bit loaded models DeepSpeed	2	3842	July 27, 2023

Does loading in 4bit override an 8bit model?

Related topics