How to minimize memory consume when loading from pretrained models?

ShiLuohe · October 9, 2023, 2:24pm

I’ve recently working on multiple 7B scale LLMs. When I load LLaMA2-7B from AutoModelForCasualLM, the memory usage is well controlled, around 28GB. However when I load MPT, which has a slightly smaller amount of parameters, it cost 31GB and I can’t load other LLMs like Falcon-7B, since I only have a 32GB system. I’m curious why loading LLaMA2 only take 27GB memory and how can I load Falcon with 32GB memory cap.

Topic		Replies	Views
Llama 3.1 8b Instruct - Memory Usage More than Reported Models	5	321	February 18, 2025
Memory Requirements for Running LLM Beginners	2	7116	May 8, 2024
Memory requierements Models	2	296	February 18, 2025
How is memory managed when loading a model? Beginners	2	6089	July 4, 2023
Question about memory usage Beginners	0	890	May 15, 2023

How to minimize memory consume when loading from pretrained models?

Related topics