hello, is it possible to run inference of quantized 8 bit or 4 bit models on cpu?
I donβt believe so since the bitsandbytes library is just a wrapper around cuda functions which are for gpu.
1 Like
for those still searching, I found some sources about
-
optimum intel for quantization on intel cpus
π€ Optimum Intel -
core ml has quantization tools for apple cpus
Compressing Neural Network Weights