Hardware Requirement GPU

muzanxdem · January 27, 2025, 2:58am

Hi, just need you guys opinion,

I have a chatbot system that run using below model:

llama 3.2 90B
llama 3.2 3B
Whisper (large)
nomic-embed-text (Embedding)

What is the best requirement cluster GPU needed to run above model?

John6666 · January 27, 2025, 3:48am

Apart from Llama 90B, the others are small (I don’t think they add up to 10GB), so we’ll look at the operating conditions for Llama 90B. At least 64GB of VRAM is required in 4-bit quantization. If you want to use fp16, you’ll need 256GB. In addition to loading the model, a little extra VRAM is required during inference, so it’s best to secure a little more VRAM than the model size.

There is a big difference in cost between 64GB and 256GB, so it’s better to run it in a quantized state (GGUF or NF4) somehow. Well, if you’re using Ollama or Llamacpp as a server, you just need to use the Q4_K_M format GGUF.

muzanxdem · January 27, 2025, 3:55am

Hi @John6666
Thank you for your response!

I am currently using Ollama to run Llama models. Would a cluster of four NVIDIA RTX A6000 GPUs be sufficient to handle the model?

Thanks!

John6666 · January 27, 2025, 4:05am

It has 192GB of VRAM!
Ollama’s default is Q4_K_M (if you don’t specify otherwise, it uses this, which is unlikely to cause problems), so I think it will be more than enough. The amount of consumption is a little over 64GB for the model, plus a little for inference. If there are a lot of people using it at the same time, or if you want to process very long sentences, it will use more VRAM, but it will still be unlikely to cause problems.
By the way, Ollama is also fast enough, but Llamacpp seems to be even faster. If you’re having trouble with speed, you might want to try changing it.

Topic		Replies	Views
LLaMA 7B GPU Memory Requirement 🤗Transformers	19	152169	February 23, 2025
Requirements Llama2 Intermediate	0	277	April 13, 2024
Best LLMs that can run on 4gb VRAM Beginners	2	3066	January 22, 2025
Local HW specs for Hosting meta-llama/Llama-3.2-11B-Vision-Instruct 🤗Transformers	4	1674	October 28, 2024
LLama3-8B - FSDP + QLORA results in OOM with 4 A40's 🤗Accelerate	1	860	June 17, 2024

Hardware Requirement GPU

Related topics