Following the example in the Readme, tried to run this:
quanto.quantize(self.model, weights=quanto.qfloat8, activations=quanto.qfloat8)
quanto.freeze(self.model)
and got (the error occurs later in the code, this line runs fine)
HuggingFace error: expected mat1 and mat2 to have the same dtype, but got: float != c10::Half
The same failure and error occurs when trying to use int4. This same line works with int8, i.e.
quanto.quantize(self.model, weights=quanto.qint8, activations=quanto.qint8)
I’m not entirely sure if the failure is in my code or within the quanto library. I did successfully run the example within quanto when I changed it to float8. But I thought this error might be happening because there are types that don’t exist in pytorch, whereas int8 does.