Reducing `load_state` memory usage

pratogab · April 1, 2024, 8:55pm

Is there a way to minimize the GPU memory usage when loading a checkpoint with accelerate?

model = AutoModel.from_pretrained(args.model)
optimizer = torch.optim.AdamW(model.parameters())
model, optimizer = accelerator.prepare(model, optimizer) # This loads the model on the GPU
accelerator.load_state(checkpoint_dir) # This loads the checkpoint weights on the GPU as well

The GPU memory usage is greater when loading a checkpoint, meaning that accelerate doesn’t load the weights in-place. I would like to avoid this extra memory overhead, but haven’t found an official solution. I know about accelerate.init_empty_weights, but as far as I can tell, it’s not meant to be used with accelerate.prepare and accelerate.load_state. Additionally, accelerator.save_state does not support sharded weights.

marcsun13 · April 15, 2024, 12:56pm

Hi @pratogab, I don’t think we have a way to minimize the GPU memory usage when loading a checkpoint with accelerate. Since each methods (ddp,fsdp,deepspeed) have their own way of loading the model in prepare/load_state_dict, this seems quite complicated to enable this.

Topic		Replies	Views
Accelerator load_state for LM head with tied weights 🤗Accelerate	0	58	September 16, 2024
How to only load model weights for the evalaution script? 🤗Accelerate	1	449	March 13, 2023
Accelerate throws CUDA: OOM 🤗Accelerate	0	423	August 22, 2024
Inflated GPU memory footprint of model prepared via accelerate 🤗Accelerate	5	764	September 15, 2023
Question about memory usage Beginners	0	909	May 15, 2023

Reducing `load_state` memory usage

Related topics