Double expected memory usage

Ryulord · February 28, 2022, 6:41am

I’ve used a few different models through Huggingface and consistently noticed approximately double expected memory usage. For example, the below code uses a model that has a 1.5 GB model file so I expect about 1.5 GB of memory usage. Instead, I see memory usage increase by about 3 GB. Is there something I’m misunderstanding or doing incorrectly?

from transformers import pipeline
from time import sleep

classifier = pipeline('zero-shot-classification', model='facebook/bart-large-mnli')
print('Done loading!')
sleep(10)

samr · August 17, 2022, 5:59pm

So the bad news it is six months after you posted this. But I’ve been reading the docs, which say that this happens because the model is loaded and then the full weights are loaded, which takes twice the RAM of the model.

Happily, in v >=4.2.0 (released June 2022) you can use low_cpu_mem_usage=True in your from_pretrained() call to address this:

This option can be activated with low_cpu_mem_usage=True . The model is first created on the Meta device (with empty weights) and the state dict is then loaded inside it (shard by shard in the case of a sharded checkpoint). This way the maximum RAM used is the full size of the model only.

from transformers import AutoModelForSeq2SeqLM

t0pp = AutoModelForSeq2SeqLM.from_pretrained("bigscience/T0pp", low_cpu_mem_usage=True)

Topic		Replies	Views
How is memory managed when loading a model? Beginners	2	6260	July 4, 2023
The CPU memory usage becomes very small during model inference 🤗Transformers	0	48	November 30, 2024
LLaMA-2: CPU Memory Usage with ‘low_cpu_mem_usage=True’ and ‘torch_dtype=“auto”’ flags 🤗Transformers	0	3281	September 1, 2023
How to minimize memory consume when loading from pretrained models? 🤗Transformers	0	344	October 9, 2023
Is model stored in free RAM or available RAM? Beginners	0	167	June 17, 2024

Double expected memory usage

Related topics