Yeah. GGUF files are pre-quantized files that can be used in their quantized state. They are to be dequantized at runtime, but this is not something we need to worry about.
Hugging Face’s Transformers are not suitable for running GGUF, so if you want to use GGUF, it is better to run it using Ollama or similar tools. There are various quantization formats available for Transformers, but BitsAndBytes is usually sufficient.
1 Like