16 GB vs 20 GB graphics card

AltitudeDashboard2 · October 21, 2024, 6:58am

If I am to choose between an RTX 4060 Ti 16 GB vs RTX 4000 ADA 20 GB, where the last one is 3 times more expensive, is there any advantage on having 20 GB vs 16 GB of VRAM? Will I be able to fit larger and better models in 20 GB vs 16 GB?

Thanks

John6666 · October 21, 2024, 7:13am

The more VRAM, the better.
Even if there is 40 GB, what is not enough is not enough.
For example, if you want to use the image generation AI model Flux without quantization, you will need 35 GB in total. On the other hand, if you use SDXL, 16GB will be enough.
For LLM, the following calculation method is helpful.
Calculate the amount of VRAM with the model you want to use.

ADA appears to be more of a power-saving, long-term stable processing product than a VRAM-rich model. It is probably for data scientists and servers.
If you are just using it for pleasure, a higher-end GeForce model would be cheaper and have more VRAM, or if your PC case is large enough, you can opt for a multi-GPU option.

AltitudeDashboard2 · October 21, 2024, 10:12am

Many thanks for your reply! My use case is for just learning. I am not gaming at all. Unfortunately my constraints are power related, I can only use a GPU which uses up to 225 W of power.
I have looked at the most popular model currently, Llama-3.1-Nemotron-70B-Instruct-HF and I have asked Gemini how much VRAM I will need.
Here is Gemini’s reply:

You’re looking at a powerful large language model! Here’s a breakdown of the VRAM requirements for nvidia/Llama-3.1-Nemotron-70B-Instruct-HF:

VRAM Needs

** Full Precision (bfloat16): Approximately 140 GB*
** Half Precision (fp16): Approximately 70 GB*
** Quantized (4-bit): Around 18-20 GB*

Important Considerations:

** Quantization: To run this model on a single, consumer-grade GPU, you’ll need to use quantization techniques. 4-bit quantization is a good balance between performance and VRAM usage.*
** GPU Memory Type: GDDR6 memory is generally faster than GDDR5, which can impact performance.*
** Other Processes: Keep in mind that other running applications and the operating system itself will consume some VRAM. Make sure you have enough free memory beyond the model’s requirements.*

Recommendations:

** Ideal: A GPU with 40GB of VRAM (e.g., NVIDIA A100 40GB) or more would be ideal for comfortable use, even allowing for some headroom.*
** Minimum: You could potentially run it on a 24GB card (e.g., RTX 3090) with 4-bit quantization, but it will be very tight, and you might encounter memory issues, especially with longer context windows or larger batch sizes.*
** 20GB Cards: Cards with 20GB of VRAM might be just on the edge of usability. You’ll likely need aggressive quantization and might have limitations on context length and batch size.*

Where to Find Information

** Hugging Face Model Card: The model card on Hugging Face often provides information about VRAM requirements and recommended hardware.*
** Community Discussions: Look for discussions and forums related to the specific model. Others who have experimented with it might share their experiences and VRAM usage.*

In summary:

While technically possible to squeeze this model onto a 20GB card with heavy quantization, it’s not recommended. For a smoother and more reliable experience, aim for at least 24GB of VRAM, with 40GB or more being ideal.

So, based on this answer, if correct, I would need 20 GB of VRAM.

John6666 · October 21, 2024, 10:37am

Gemini is smart! He’s almost right.
I think anyone who runs the 70B model locally on HF is basically a 4090+ user…
Assuming 4-bit quantization, of course.
It’s really hard to do anything other than simple use when you’re on the edge of performance.
So you might want to try a smaller sized model.
Larger models can be run on GPUs rented for a fee from HF or other sites. For learning purposes, that’s not a bad idea. (For example, Google Colab Free has 16GB and HF’s $10/month has 40GB, both with various limitations.)

But the problem is power. The power supply capacity is a very theoretical value to begin with, so you should expect only 60-70% of the power supply capacity to be available in practical use. If possible, replacing the power supply is the cheapest option. If it’s not possible, then just you have to consider about 4GB and the price…
Is it worth it?

AltitudeDashboard2 · October 21, 2024, 2:16pm

It is expensive but probably I will get the ADA 4000 GPU, probably it will serve me more in the long run. Thanks for replying.

John6666 · October 21, 2024, 9:30pm

If you are convinced to buy, it’s OK.

Topic		Replies	Views
Should I just get more RAM? Beginners	4	2124	December 22, 2024
Multi GPU Build Possible? Beginners	2	202	January 19, 2025
Hardware Requirement GPU Beginners	3	1257	January 27, 2025
Local HW specs for Hosting meta-llama/Llama-3.2-11B-Vision-Instruct 🤗Transformers	4	1715	October 28, 2024
How much VRAM and how many GPUs to fine-tune a 70B parameter model like LLaMA 3.1 locally? Models	1	327	April 17, 2025

16 GB vs 20 GB graphics card

Related topics