Mixtral-8x7B trained with `--load_in_4bit`, showed as Tensor type F32

icpro · May 16, 2024, 4:38am

Hello,

I have fine-tuned a Mixtral-8x7B model with SFTTrainer and accelerate, with the official training script.

One of the parameters that I set during training is --load_in_4bit. After training, I pushed the model to the hub, and I see that the specified Tensor type is F32:

Does it mean that after training the model, the weights data type reverted to F32?
If I quantize back this model to 4-bits, will I get the exact same performances?

Could someone shed some light about this? Am I doing things right?

Chahnwoo · May 17, 2024, 12:25am

Hello @icpro,

I think this behavior is correct. While the weights are quantized to 4bits, they are de-quantized during calculations. I don’t know much about the specifics myself, but it seems that the training/fine-tuning processes (like that of QLoRA) have been optimized to enable this.

Here’s a link for some context : Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA

icpro · May 17, 2024, 7:58am

@Chahnwoo many thanks for your answer. Does it mean that if I 4bit-quantize it with llama.cpp, I’ll get the exact same weights?

Chahnwoo · May 17, 2024, 8:12am

@icpro I’ve never tried working with llama.cpp before.

My initial guess would be no; the quantized weights are determined by the quantization algorithm that is applied to the original weights, and I would assume that different libraries implement different quantization algorithms.

Then again, I still haven’t fully grasped how load_in_4bit performs quantization, so my answer is a guess at best.

Topic		Replies	Views
Mixtral training creates additional embedded token and head weights 🤗Transformers	0	79	June 13, 2024
Error while training Mixtral in 8bit Intermediate	0	292	January 16, 2024
Understanding how changing bnb_4bit_compute_dtype affects outputs 🤗Transformers	1	4628	February 10, 2024
Peft model from pretrained load in 8/4 bit 🤗Transformers	6	17496	October 12, 2023
Pushing a quantized (4bit) model on the Hub 🤗Transformers	9	4234	January 8, 2024

Mixtral-8x7B trained with `--load_in_4bit`, showed as Tensor type F32

Related topics