Model merging leads to different output

mboyanov · August 26, 2024, 6:49am

I am working towards applying llm2vec on my own dataset and model. It is a procedure, which has two finetuning stages. The details are not important.

I am noticing that the model output changes after calling .merge_and_unload(). About 10% of the final activations are identical, but some of them differ significantly. Is this expected behaviour?

I am guessing that some precision might be lost in the merging process and this could be compounding through the 32 layers?

As a side note the original model is loaded in bf16, while the Peft model uses float32.

nielsr · August 26, 2024, 1:49pm

cc @BenjaminB

BenjaminB · August 26, 2024, 2:08pm

Yes, there is a bit of loss of precision caused by merging, which is inevitable.

About 10% of the final activations are identical, but some of them differ significantly

How much is “significant” in this context?

mboyanov · August 27, 2024, 6:57am

Thank you for the fast reply.
The model I am using is meant for encoding sentences, so it is taking the mean of the activations of the last layer. This results in a vector with size 4096.
Looking at a single example, I calculate a RMSE of 0.0458 between the merged and unmerged models.
Here are some more stats based on the difference:

count    4096.000000
mean       -0.000345
std         0.045758
min        -0.312500
25%        -0.031250
50%         0.000000
75%         0.031250
max         0.500000

BenjaminB · September 17, 2024, 10:53am

Sorry I didn’t reply earlier, I must have missed the notification. Indeed, this kind of discrepancy is in line with what’s expected. The stronger the quantization, the higher the discrepancy (so e.g. it’s worse for 4bit than 8bit).

Topic		Replies	Views
Merged and Saved model not giving same result after loading Models	3	81	December 27, 2024
Difference between AutoModelForCausalLM and peft_model.merge_and_unload() for a LoRA model during inference 🤗Transformers	2	1313	August 2, 2024
Finetuned LLM model conversion to GGUF - performance drop Models	4	1791	July 31, 2024
`merge_and_unload` moves some layers in CPU 🤗Transformers	0	96	June 29, 2024
Different results from checkpoint evaluation when loading fine-tuned LLM model Intermediate	5	3232	September 22, 2023

Model merging leads to different output

Related topics