OS: Debian 12 with a pinned NVIDIA .run driver (frozen for kernel stability).
LLMs: each in its own Python venv to keep the global stack clean.
Tools in a default “example-venv”: PyTorch, SciPy, NumPy, pandas, Matplotlib, scikit-learn.
Current favorite: DeepSeek-Coder-V2-Lite-Instruct (GGUF, Q8_0) for offline code help; I run it locally and use the venv to execute/validate.
I’d love feedback on two points:
With a 32 GB GPU, which models are you finding best in practice as a coding assistant?
For longer tasks, do you prefer a slightly smaller model with bigger context, or a stronger model accepting the risk of some forgetting of chat history?
If you’re looking for a coding-specialized model that can be quantized to 32 GB or less (preferably 16 GB or less when considering memory for context), Qwen Coder series would be a safe bet. Devstral and NextCoder also seems promising.
How’s your experience been with driver support for native fan control with your Inno3D 5090?
I was looking at some similar RTX 5090 builds for local ai on llamabuilds.ai and it looked like most builds there prefer the reference nVidia RTX 5090 or MSI models?
How did you resolve the issue with sm_120 support? I tried and I couldn’t get it to work. Otherwise you’re not going to be able to infer anything with it.
sm_120 is only supported in the torch nightly builds. I’m running on dual RTX 5070’s.
Make sure you are running cuda 13.0.1 ( CUDA 13.0 Update 1 >=580 )
and install torch nightly build
download. pytorch. org /whl/nightly/cu130
pip show torch
Name: torch
Version: 2.10.0.dev20250910+cu130