Which model is best for code generation under [b]10GB[/b]

Within 10 GB VRAM with no quantization, it’s virtually impossible to use a model larger than 3B…:sweat_smile:

With 4-bit quantization in GGUF, even with 10 GB, a 12B model would be practical, and in that case, there would be many usable models.