Model merging leads to different output

I am working towards applying llm2vec on my own dataset and model. It is a procedure, which has two finetuning stages. The details are not important.

I am noticing that the model output changes after calling .merge_and_unload(). About 10% of the final activations are identical, but some of them differ significantly. Is this expected behaviour?

I am guessing that some precision might be lost in the merging process and this could be compounding through the 32 layers?

As a side note the original model is loaded in bf16, while the Peft model uses float32.

cc @BenjaminB

Yes, there is a bit of loss of precision caused by merging, which is inevitable.

About 10% of the final activations are identical, but some of them differ significantly

How much is “significant” in this context?

Thank you for the fast reply.
The model I am using is meant for encoding sentences, so it is taking the mean of the activations of the last layer. This results in a vector with size 4096.
Looking at a single example, I calculate a RMSE of 0.0458 between the merged and unmerged models.
Here are some more stats based on the difference:

count    4096.000000
mean       -0.000345
std         0.045758
min        -0.312500
25%        -0.031250
50%         0.000000
75%         0.031250
max         0.500000

Sorry I didn’t reply earlier, I must have missed the notification. Indeed, this kind of discrepancy is in line with what’s expected. The stronger the quantization, the higher the discrepancy (so e.g. it’s worse for 4bit than 8bit).