Local HW specs for Hosting meta-llama/Llama-3.2-11B-Vision-Instruct

Omran99 · October 26, 2024, 2:17pm

Dears
can you share please the HW specs - RAM, VRAM, GPU - CPU -SSD for a server that will be used to host meta-llama/Llama-3.2-11B-Vision-Instruct and used in my RAG application that has excellent response time…I need good customer experience.
Thanks for your support…

Regards,
Omran

John6666 · October 26, 2024, 2:55pm

You can do the math with these. With quantization, you can run with 12 GB of VRAM. Without quantization, you need more than 30 GB of VRAM. You have to decide how you want to use them.

Omran99 · October 26, 2024, 4:12pm

really thanks for such professional help John…
for llama3.2 what is the model footprint (parameter size) is it 32bit,16bit, 8 bit
Can I change it or it is fixed during downloading it

regards
Omran

John6666 · October 27, 2024, 1:02am

for llama3.2 what is the model footprint (parameter size) is it 32bit,16bit, 8 bit

This can easily be specified when the model is loaded. The same is true for some typical quantization. However, quantization basically requires a GPU, which is difficult to do on just a laptop. A gaming laptop can handle it, but I think dealing with AI on a laptop will shorten the life of the laptop…
Games don’t keep using the GPU at full power, but AI uses it at full power all the time.

Omran99 · October 28, 2024, 4:15pm

Thanks Jhon – I used you suggestions

Topic		Replies	Views
Llama 3.1 70-B run on 32 GB Vram? 🤗Transformers	5	3781	September 20, 2024
How much VRAM and how many GPUs to fine-tune a 70B parameter model like LLaMA 3.1 locally? Models	1	269	April 17, 2025
Hardware Requirement GPU Beginners	3	1164	January 27, 2025
Determining if a model will run locally Beginners	4	474	April 7, 2025
Requirements Llama2 Intermediate	0	277	April 13, 2024

Local HW specs for Hosting meta-llama/Llama-3.2-11B-Vision-Instruct

Related topics