Running an LLM with high output quality locally

John6666 · February 21, 2025, 5:07pm

Let me say this first. It’s reckless… the amount of money needed is off by one digit.

Possible compromises would be to use a model with weaker but similar characteristics, or to accept the hopelessly slow speed and load it into RAM or SSD instead of VRAM and infer from there. RAM is cheap now. There is a 128GB limit for consumer-use Windows, but…
Hugging Face’s accelerate library can also run models stored on SSD.

For example, models with characteristics similar to o1 (reasoning-focused models) are as follows.

Topic		Replies	Views
Come utilizzare DeepSeek-V3 tramite API remota? Models	6	124	January 5, 2025
Hello experts please help on running local DeepSeek-R1-0528-Qwen3-8B Beginners	2	175	June 10, 2025
Problem with launching DeepSeek-R1-Distill-Qwen-32B-Uncensored-Q8_0-GGUF Models	32	547	March 18, 2025
Organization Pricing Beginners	1	418	February 22, 2021
Performance of hosted inference API Beginners	0	296	February 16, 2021

Running an LLM with high output quality locally

Related topics