Memory Requirements for Running LLM

pjp94 · October 3, 2023, 7:05pm

Are there any rule of thumb calculations for determining memory requirement (as a function of number of model parameters) for an LLM model. I’m referring to a base model (no quantization) with full fine-tuning vs no fine-tuning that I would like to run inference on. Lets use Llama2 7B as an example. Below is what I’ve seen for pre-training, but not sure how this exactly translates over to loading in the pre-trained base model and working with it. The disconnect here for me is I’m unsure what components of the model exactly are getting stored in memory. Any help appreciated!

MattiLinnanvuori · October 4, 2023, 4:31am

https://huggingface.co/spaces/hf-accelerate/model-memory-usage is a tool.

nielsr · May 8, 2024, 7:30pm

There’s also this nice blog post: Calculating GPU memory for serving LLMs | Substratus.AI

Topic		Replies	Views
Requirements Llama2 Intermediate	0	285	April 13, 2024
How to quickly determine memory requirements for model Beginners	7	23021	August 28, 2024
LLaMA 7B GPU Memory Requirement 🤗Transformers	19	153900	February 23, 2025
Identify model requirements in memory and disk Models	1	37	July 26, 2025
Memory requierements Models	2	493	February 18, 2025

Memory Requirements for Running LLM

Related topics