Dears
can you share please the HW specs - RAM, VRAM, GPU - CPU -SSD for a server that will be used to host meta-llama/Llama-3.2-11B-Vision-Instruct and used in my RAG application that has excellent response time…I need good customer experience.
Thanks for your support…
You can do the math with these. With quantization, you can run with 12 GB of VRAM. Without quantization, you need more than 30 GB of VRAM. You have to decide how you want to use them.
really thanks for such professional help John…
for llama3.2 what is the model footprint (parameter size) is it 32bit,16bit, 8 bit
Can I change it or it is fixed during downloading it
for llama3.2 what is the model footprint (parameter size) is it 32bit,16bit, 8 bit
This can easily be specified when the model is loaded. The same is true for some typical quantization. However, quantization basically requires a GPU, which is difficult to do on just a laptop. A gaming laptop can handle it, but I think dealing with AI on a laptop will shorten the life of the laptop…
Games don’t keep using the GPU at full power, but AI uses it at full power all the time.