Identify model requirements in memory and disk

ukantonios · July 26, 2025, 9:47am

I want to find out what model can fit in our GPU. How can I find this info from hugging face model site? For example when I go into llama3 8b model I see the files size is over 200GB (multiple files). If I deploy this model with vllm in Kubernetee, does it mean it will require a persistent volume of over 200GB? Will also need NVRAM of that size too? It is not clear to me how to identify nvram need for each model from hugging face

John6666 · July 26, 2025, 2:02pm

In a conventional LLM architecture, when quantizing to 4 bits, if the model parameter size is 8B, it can run smoothly with 8GB of VRAM, which serves as a general guideline. In this case, the model size itself is smaller than 8GB, but additional memory is required for storing context and other data during inference. If quantization is not applied, it is advisable to estimate 4 to 8 times the VRAM. This is easy to remember.

Additionally, in formats like GGUF, the ability to run on the GPU set in your profile is now displayed as shown in this image. If it’s green, there’s no issue.

Topic		Replies	Views
Determining if a model will run locally Beginners	4	1829	April 7, 2025
Local HW specs for Hosting meta-llama/Llama-3.2-11B-Vision-Instruct 🤗Transformers	4	1924	October 28, 2024
Let's add system requirements to model publications 🤗Hub	5	933	October 27, 2024
Should I just get more RAM? Beginners	4	2963	December 22, 2024
Feature Suggestion! running large gguf models! Inference Endpoints on the Hub	0	541	December 3, 2023

Identify model requirements in memory and disk

Related topics