Am I correct in understanding that Q_4_K_S quantization means 4-bit quantization with k-quantization(K) and a small(S) block size? What does a small block size mean? Q_3 means 3 bit quantization? Thanks!
1 Like
I can’t say I understand it correctly either, so I tried searching for it. Well, if you’re in trouble, use Q4_K_M.
https://www.reddit.com/r/LocalLLaMA/comments/1d1sc50/gguf_weight_encoding_suffixes_is_there_a_guide/