Number of parameters reduced after loading in 4bit

Why is there a decrease in no of parameters in 4bit?


When we load in 4 bit, the linear layers are replaced with linear 4bit layers. These layers have half the number of parameters. But still I am also not clear how number of parameters become half.

In the source code, when picking 4-bit, the parameter count is divided by 2.

I don’t know why, you can check the code

I still don’t have a clear answer for this and I would love to know, bumping for visibility.

I too has the same doubt. @sgugger @sayakpaul sorry to tag you guys… I am thinking I could use your help here.

Hi everyone. Have you figured this out? Is there a way to get access to the full weight with 4bit quantization?

cc @ybelkada

This is normal since torch.int4 data dtype is not supported in PyTorch. Instead, we pack the 4bit data into torch.int8 tensor, hence the number of parameters is divided by 2 when we quantize in 4 bit !
From @marcsun13