Double expected memory usage

I’ve used a few different models through Huggingface and consistently noticed approximately double expected memory usage. For example, the below code uses a model that has a 1.5 GB model file so I expect about 1.5 GB of memory usage. Instead, I see memory usage increase by about 3 GB. Is there something I’m misunderstanding or doing incorrectly?

from transformers import pipeline
from time import sleep

classifier = pipeline('zero-shot-classification', model='facebook/bart-large-mnli')
print('Done loading!')
sleep(10)

So the bad news it is six months after you posted this. But I’ve been reading the docs, which say that this happens because the model is loaded and then the full weights are loaded, which takes twice the RAM of the model.

Happily, in v >=4.2.0 (released June 2022) you can use low_cpu_mem_usage=True in your from_pretrained() call to address this:

This option can be activated with low_cpu_mem_usage=True . The model is first created on the Meta device (with empty weights) and the state dict is then loaded inside it (shard by shard in the case of a sharded checkpoint). This way the maximum RAM used is the full size of the model only.

from transformers import AutoModelForSeq2SeqLM

t0pp = AutoModelForSeq2SeqLM.from_pretrained("bigscience/T0pp", low_cpu_mem_usage=True)
2 Likes