Mixtral-8x7B trained with `--load_in_4bit`, showed as Tensor type F32


I have fine-tuned a Mixtral-8x7B model with SFTTrainer and accelerate, with the official training script.

One of the parameters that I set during training is --load_in_4bit. After training, I pushed the model to the hub, and I see that the specified Tensor type is F32:

  • Does it mean that after training the model, the weights data type reverted to F32?
  • If I quantize back this model to 4-bits, will I get the exact same performances?

Could someone shed some light about this? Am I doing things right?

Hello @icpro,

I think this behavior is correct. While the weights are quantized to 4bits, they are de-quantized during calculations. I don’t know much about the specifics myself, but it seems that the training/fine-tuning processes (like that of QLoRA) have been optimized to enable this.

Here’s a link for some context : Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA

@Chahnwoo many thanks for your answer. Does it mean that if I 4bit-quantize it with llama.cpp, I’ll get the exact same weights?

@icpro I’ve never tried working with llama.cpp before.

My initial guess would be no; the quantized weights are determined by the quantization algorithm that is applied to the original weights, and I would assume that different libraries implement different quantization algorithms.

Then again, I still haven’t fully grasped how load_in_4bit performs quantization, so my answer is a guess at best.