Bitsandbytes `has_fp16_weights` issue

When loading a LLM using int8 quantization as specified in LLM.int8(), how are fp16_weights handled?

from transformers import (
    AutoModelForCausalLM,
    BitsAndBytesConfig,
)
from bitsandbytes.nn.modules import Linear8bitLt

quantization_config = BitsAndBytesConfig(
    load_in_8bit=True,
    llm_int8_has_fp16_weight=True,
    llm_int8_threshold=0.0000000001,
)
print(quantization_config)

model = AutoModelForCausalLM.from_pretrained(
    "cache/bigscience/bloom-1b7",
    quantization_config=quantization_config,
)

for k, v in model.named_modules():
    if isinstance(v, Linear8bitLt):
        print("===")
        print(k)
        print(v.weight.has_fp16_weights)

Examples of output are:

transformer.h.23.self_attention.dense
False
===
transformer.h.23.mlp.dense_h_to_4h
False
===
transformer.h.23.mlp.dense_4h_to_h
Fals

Even though I specified llm_int8_has_fp16_weight as True, all Linear8bitLt module weight’s has_fp16_weights are printed as False.
Does it internally holds fp16 weights (for outliers)? How can I access to these values.
I’m so confused.

I have the same doubt too - where are detected outliers stored? Because LLM.int8() performs operations on these outliers using fp16 compared to others in int8, it becomes important to store their positions/values. Moreover, when I check the model weight dtype, all of them are int8. I don’t understand what’s going on here?

Also, do note that the config parameter llm_int8_has_fp16_weight has a different use case. It’s used for finetuning the model where all multiplications are carried out in fp16, although they’re stored in int8.