Inference 8 bit or 4 bit bit models on cpu?

hello, is it possible to run inference of quantized 8 bit or 4 bit models on cpu?

I don’t believe so since the bitsandbytes library is just a wrapper around cuda functions which are for gpu.

1 Like

for those still searching, I found some sources about