Deepseek Inference so slow

serpastorg · July 12, 2025, 1:29pm

I’m a pure newbie in this AI topic. I need some help.
My hardware is Ryzen 5 7600, 16 GB RAM, and NVidia RTX 3050 8GB graphics card, with Linux 6.11.0-1016-lowlatency x86_64, NVidia’s open driver 570.158.01 and CUDA 12.4.127 in the virtual environment running in Ubuntu 24.04.2 LTS.
It’s not a really powerful computer, but it’s not that bad to work that slow.
Deepseek’s inference is completely slow working with transformers interface. I recently installed BitsAndBytes.
I’d like to know which are the recommended parammeters to start CLI chatbot interface, or IPython’s one. That’s the reason to write this post.
Please, have into account I’m just a newbie. I’ve not kept in touch with any AI’s model before. My model is open-R1 from HuggingFace.
Thank you very much.

John6666 · July 12, 2025, 1:50pm

With those GPU specs, I think you’d be better off using Ollama or vLLM to easily leverage the performance. In particular, the GGUF format used with Ollama isn’t particularly fast, but it’s an excellent format for offloading when VRAM is limited.
Generally speaking, Transformers offer advanced customization options, but if you don’t need that level of customization, other backends are easier to use.

serpastorg · July 12, 2025, 2:18pm

How can I use this things?
I have just installed ollama.
Thank you very much. Excuse me, I’m a newbie.

John6666 · July 12, 2025, 2:28pm

Ollama is the quickest option.

Once you download and install Ollama, you should be able to launch it directly from the command line.
Ollama runs as a local server and can also be used as an API server compatible with the OpenAI API, so if you want to create your own chatbot GUI, you can access it via that API. It also handles load balancing.

Incidentally, you can achieve the same thing with Transformers using TGI. It’s also possible with vLLM.

Topic		Replies	Views
Hello experts please help on running local DeepSeek-R1-0528-Qwen3-8B Beginners	2	164	June 10, 2025
Llama3 so much slow compared to ollama 🤗Transformers	15	10463	February 28, 2025
[Help] GPU with query answering 🤗Transformers	0	330	November 25, 2020
Deploy multilingual sentence tansformer into cloud Beginners	10	2716	July 16, 2021
Gpt-neo inference with Deepspeed: IndexError: Dimension out of range Beginners	0	483	August 10, 2021

Deepseek Inference so slow

Related topics