Within 10 GB VRAM with no quantization, it’s virtually impossible to use a model larger than 3B…
With 4-bit quantization in GGUF, even with 10 GB, a 12B model would be practical, and in that case, there would be many usable models.
Within 10 GB VRAM with no quantization, it’s virtually impossible to use a model larger than 3B…
With 4-bit quantization in GGUF, even with 10 GB, a 12B model would be practical, and in that case, there would be many usable models.