Hi!
Is it possible to use the transformers library with GGML-style quantization? For example, GGML_TYPE_Q5_K (as detailed here).
I’m aware that I can use bitsandbytes to do 8-bit and 4-bit quantization, but I’d prefer to use GGML-style quantization.
Thanks!