Local LLM and ML platform with RTX 5090 GPU

I built a local AI workstation around an RTX 5090 (32 GB) for an uninterrupted, offline coding workflow.

OS: Debian 12 with a pinned NVIDIA .run driver (frozen for kernel stability).
LLMs: each in its own Python venv to keep the global stack clean.
Tools in a default “example-venv”: PyTorch, SciPy, NumPy, pandas, Matplotlib, scikit-learn.

Short demo + full setup notes:
https://localprompt.ai/demo.mp4
https://localprompt.ai
System Specifications – LocalPrompt.ai

Current favorite: DeepSeek-Coder-V2-Lite-Instruct (GGUF, Q8_0) for offline code help; I run it locally and use the venv to execute/validate.

I’d love feedback on two points:

  1. With a 32 GB GPU, which models are you finding best in practice as a coding assistant?
  2. For longer tasks, do you prefer a slightly smaller model with bigger context, or a stronger model accepting the risk of some forgetting of chat history?
1 Like

1

If you’re looking for a coding-specialized model that can be quantized to 32 GB or less (preferably 16 GB or less when considering memory for context), Qwen Coder series would be a safe bet. Devstral and NextCoder also seems promising.

3 Likes

Thanks a lot, I’ll try it somewhere next week and post my findings here.

1 Like