Why am I not getting the exact output of 4-bit quantization using NF4?

I was going through the 4-bit quantization using this article.

To make the understanding clear, the authors have written two codes:

  1. First one was implemented from scratch
  2. and the Second one was bitsandbytes implementation.

I got the exact output using the first one (implemented from scratch), but I did not get the same output using the bitsandbytes library.

My code link: Google Colab

Can anyone tell me the reason behind this?

1 Like
!pip install -U bitsandbytes

It is possible that there is some mathematically serious reason for the difference, but if the output is simply different, it could be a difference in library versions.

Since neither the author of the article nor the Colab code specifies a version, the most recent stable version will be installed, but this is only the version that would be the least annoying in practical use, and does not guarantee any other identity.
In the past, there were actual cases where even the format strictly defined in the specifications was not followed in the company’s own implementation.
Isn’t it most likely that the author of the article, the old bitsandbytes and the current bitsandbytes are all generally correct but only slightly different? Otherwise, one or more of them is buggy, but if it is a bug with major practical problems, one of the users will notice it.

The core code seems to be the same as it was 7 months ago, which seems unlikely, but library behavior is a nonsense thing.

There is a massive difference in the output. Still, I am unable to figure it out.

If so, it is possible that the article’s author’s implementation or the implementation by bitsandbytes is not following the theory, and that the conversion and inverse conversion can be done, but not work in the actual model…?

NF4 is a format that is getting a lot of attention, so there may be others besides that author who have tried to analyze it independently. Such a sample might provide a clue to the cause of the problem.
Alternatively, you could try to run it on the actual model, but since there must still be some problems with the official torch support, this method may be more difficult to understand, since other problems may be involved.