hello, is it possible to run inference of quantized 8 bit or 4 bit models on cpu?
I don’t believe so since the bitsandbytes library is just a wrapper around cuda functions which are for gpu.
1 Like
for those still searching, I found some sources about
-
optimum intel for quantization on intel cpus
🤗 Optimum Intel -
core ml has quantization tools for apple cpus
Compressing Neural Network Weights