Bitsandbytes `has_fp16_weights` issue

zzaebok · May 30, 2024, 7:38am

When loading a LLM using int8 quantization as specified in LLM.int8(), how are fp16_weights handled?

from transformers import (
    AutoModelForCausalLM,
    BitsAndBytesConfig,
)
from bitsandbytes.nn.modules import Linear8bitLt

quantization_config = BitsAndBytesConfig(
    load_in_8bit=True,
    llm_int8_has_fp16_weight=True,
    llm_int8_threshold=0.0000000001,
)
print(quantization_config)

model = AutoModelForCausalLM.from_pretrained(
    "cache/bigscience/bloom-1b7",
    quantization_config=quantization_config,
)

for k, v in model.named_modules():
    if isinstance(v, Linear8bitLt):
        print("===")
        print(k)
        print(v.weight.has_fp16_weights)

Examples of output are:

transformer.h.23.self_attention.dense
False
===
transformer.h.23.mlp.dense_h_to_4h
False
===
transformer.h.23.mlp.dense_4h_to_h
Fals

Even though I specified llm_int8_has_fp16_weight as True, all Linear8bitLt module weight’s has_fp16_weights are printed as False.
Does it internally holds fp16 weights (for outliers)? How can I access to these values.
I’m so confused.

lakshmi97 · August 15, 2024, 6:15pm

I have the same doubt too - where are detected outliers stored? Because LLM.int8() performs operations on these outliers using fp16 compared to others in int8, it becomes important to store their positions/values. Moreover, when I check the model weight dtype, all of them are int8. I don’t understand what’s going on here?

Also, do note that the config parameter llm_int8_has_fp16_weight has a different use case. It’s used for finetuning the model where all multiplications are carried out in fp16, although they’re stored in int8.

Topic		Replies	Views
Why are some weights FP32 in Llama 3.1 405B FBGEMM FP8 Quantization? Models	7	485	September 27, 2024
The quantization code in the "Gentle Introduction to 8-bit Matrix Multiplication for transformers" blog post yields error 🤗Transformers	1	724	May 29, 2023
Correct Usage of BitsAndBytesConfig 🤗Transformers	4	29724	March 18, 2023
BitsandBytes conflict with Accelerate 🤗Accelerate	6	496	April 14, 2025
[Guide] Quantize LLM CoreML to int8 on Mac ARM (TinyLlama, May 2025, tested workflow & script) 🤗Optimum	0	45	May 26, 2025

Bitsandbytes `has_fp16_weights` issue

Related topics