Running an LLM with high output quality locally

LucieHEC · February 21, 2025, 2:00pm

Hi everyone,

I would need to run an LLM locally due to confidentiality concerns. I thought of running DeepSeekV3, but I don’t have more than approx. 3.5K$ to spend on a GPU. Do you think that I could have options that perform as well as ChatGPT 4o (ish)?

Thanks a lot for your time!!

LucieHEC · February 21, 2025, 2:17pm

Ok I’ve been told that “performing as well as” is not specific enough I would need approx. 7-10 T/s, and a high quality of output

John6666 · February 21, 2025, 5:07pm

Let me say this first. It’s reckless… the amount of money needed is off by one digit.

Possible compromises would be to use a model with weaker but similar characteristics, or to accept the hopelessly slow speed and load it into RAM or SSD instead of VRAM and infer from there. RAM is cheap now. There is a 128GB limit for consumer-use Windows, but…
Hugging Face’s accelerate library can also run models stored on SSD.

For example, models with characteristics similar to o1 (reasoning-focused models) are as follows.

John6666 · February 21, 2025, 5:14pm

https://unsloth.ai/blog/deepseekr1-dynamic

LucieHEC · February 21, 2025, 5:34pm

Thanks a lot for your reply!!

Two questions: 1) Is there a way to test the quality of output of deepseekr1-dynamic? Or the only way to do this is deploying the model on a Google Cloud VM, for instance? 2) If I set my budget to 6k$, would I be able to run this model comfortably?

Also I found this thread on Twitter, explaining how a guy was able to run R1 on a 6k$ machine using solely RAM and 2 CPUs. Do you know if I could apply this logic for a config that would run DeepSeek V3?

Thanks again for your time, and sorry for my very ignorant questions :')

John6666 · February 22, 2025, 4:04am

It’s not easy to evaluate the quality of the output. I think someone is probably checking whether it’s usable or not. You should rely on that.

Anyway, the more you try to do it on a low budget, the more knowledge and experience you need…
It’s a juggling act.
6k dollars is a huge amount, but even 60k dollars wouldn’t be enough…

That said, the expensive part is the GPU, so if you’re assuming you’ll be using a CPU and RAM, 3k dollars should be enough. The problem is purely accuracy and speed. It’s a question of how much you’re willing to compromise.

Originally, the largest models for both V3 and R1 are provided with 8-bit precision, but those who run them on CPU and RAM are forced to reduce the precision to 4-bit, 2-bit, or 1.5-bit, so of course the output quality is sacrificed. However, if you don’t reduce the precision, it will not be possible to run it on a home PC. That is the size of the models.

In the case of practical use, not for research or demonstration purposes, it is often the case that better results can be obtained by using a smaller model with 4-bit or 8-bit precision from the start.

github.com/ollama/ollama

running deepseek r1 671b on 64GB / 128GB ram mac gives `Error: llama runner process has terminated: signal: killed`

opened 12:07AM - 25 Jan 25 UTC

duttaoindril

bug

### What is the issue? after waiting all day for the model to download, `ollama… run deepseek-r1:671b` fails to run with the error `Error: llama runner process has terminated: signal: killed`. I can run the deepseek-r1:70b llama model just fine. I'm running a Macbook M3 Pro 64GB ram, I'm assuming it's failing due to lack of memory? - how do I know the real memory requirements for a model? i don't think it's obvious on the ollama page. - any way to fix this at all? I tried it on my 128GB M1 Ultra Mac Studio and got the same error. I'd really love to run this locally, so would appreciate any help! ### OS macOS ### GPU Apple ### CPU Apple ### Ollama version 0.5.7

Topic		Replies	Views
Best way to deploy a SLM/LLM model. Best library and approach? Research	6	828	March 11, 2025
Resource required to fine tune a large model? Beginners	0	398	November 12, 2022
Advice for locally run AI Assistant Beginners	6	822	March 10, 2025
BUYING ADVICE for local LLM machine Beginners	9	1429	March 26, 2025
Alot of questions, or, How can i run models locally (for an absolute begginger) Beginners	3	43	July 4, 2025

Running an LLM with high output quality locally

Related topics