How much memory required to load T0pp

mojians · October 19, 2021, 1:47pm

Hi, I’m trying to load the T0pp model (49GB). However, after quite a while, the system threw a read error. I suppose it’s because the memory of my machine is not enough to load it. Any info about how much memory is required to load the model? Or is there any trick to go around it? Thank you very much.

nielsr · October 19, 2021, 2:22pm

Hi,

Looking at the model repo, it seems to be 41.5 GB. Actually you need twice as much CPU RAM in order to load the model. When calling .from_pretrained(), the model actually gets loaded twice: once with randomly initialized weights, once with the pretrained weights. However, @stas has added a new (experimental) argument called low_cpu_mem_usage, which can be set to True, in order to only load the model once into CPU memory (directly with the pretrained weights), see this PR. So using that argument, it requires at least 41.5 GB of CPU RAM.

Next, if you want to perform inference on GPU, you also need at least the same amount of GPU RAM (41.5 GB) in order to put the model on it, + you need some extra space for the data you put on it, as well as the activations (i.e. logits).

stas · October 19, 2021, 4:11pm

Additionally consider using deepspeed with offload enabled:

This model is the same in size as t5-11b so the same setup applies to using t0. e.g. here is info how to load it on a single 40GB gpu for finetuning:
https://github.com/huggingface/transformers/issues/9996#issuecomment-856384448

Performance and Scalability will further reduce memory usage.

re deepspeed usage for this model, here is the breakdown for 1 gpu (everything but the activations memory):

python -c 'from transformers import AutoModel; \
from deepspeed.runtime.zero.stage3 import estimate_zero3_model_states_mem_needs_all_live; \
model = AutoModel.from_pretrained("bigscience/T0pp"); \
estimate_zero3_model_states_mem_needs_all_live(model, num_gpus_per_node=1, num_nodes=1)'
2021-10-19 09:05:09.964152: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] 
Estimated memory needed for params, optim states and gradients for a:
HW: Setup with 1 node, 1 GPU per node.
SW: Model with 11003M total params, 131M largest layer params.
  per CPU  |  per GPU |   Options
  276.70GB |   0.49GB | cpu_offload=1, cpu_offload_params=1, zero_init=1
  276.70GB |   0.49GB | cpu_offload=1, cpu_offload_params=1, zero_init=0
  245.95GB |  20.99GB | cpu_offload=1, cpu_offload_params=0, zero_init=1
  245.95GB |  20.99GB | cpu_offload=1, cpu_offload_params=0, zero_init=0
    0.74GB | 184.95GB | cpu_offload=0, cpu_offload_params=0, zero_init=1
   61.49GB | 184.95GB | cpu_offload=0, cpu_offload_params=0, zero_init=0

So you can see that either you need a huge amount of RAM (and you can also use nvme for offloading!) and then any tiny gpu will do or you can use it on a 40GB and bigger single gpu.

Change num_gpus_per_node=1 to your number of gpus to get the estimate for your setup.

and remember there will be needed more memory for activations, which would depend on the batch size and seqlen.

mojians · October 19, 2021, 9:02pm

Thanks, nielsr. That’s very helpful. I’ll give it a try.

mojians · October 20, 2021, 1:03am

Thanks sir, very very helpful. I’ll try to run deep speed to see it works.

Topic		Replies	Views
How to load T0pp into 40Gb of GPU memory using mixed precisoin? DeepSpeed	2	866	July 21, 2022
Double expected memory usage Beginners	1	1407	August 17, 2022
How to quickly determine memory requirements for model Beginners	7	22526	August 28, 2024
How is memory managed when loading a model? Beginners	2	6209	July 4, 2023
Question about memory usage Beginners	0	906	May 15, 2023

How much memory required to load T0pp

Related topics