Multi-GPU inference with accelerate

danielmisrael · October 19, 2023, 12:06am

I am trying to run multi-gpu inference for LLAMA 2 7B. I am running on NVIDIA RTX A6000 gpu’s, so the model should fit on a single gpu. I ran set the accelerate config file as follows:

Which type of machine are you using?
multi-GPU
How many different machines will you use (use more than 1 for multi-node training)? [1]:
Should distributed operations be checked while running for errors? This can avoid timeout issues but will be slower. [yes/NO]:
Do you wish to optimize your script with torch dynamo?[yes/NO]:
Do you want to use DeepSpeed? [yes/NO]:
Do you want to use FullyShardedDataParallel? [yes/NO]:
Do you want to use Megatron-LM ? [yes/NO]:
How many GPU(s) should be used for distributed training? [1]:2
What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:4,5,6,7

I run with accelerate launch and get the following memory error:

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 25.10 GiB. GPU 0 has a total capacty of 47.54 GiB of which 21.88 GiB is free. Including non-PyTorch memory, this process has 25.65 GiB memory in use. Of the allocated memory 25.23 GiB is allocated by PyTorch, and 12.97 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

triggered by the line

model, dataloader = accelerator.prepare(model, dataloader)

Why is this happening? I set the dataloader to be a trivially small dataset. So given that, the GPU has capcity 49140 MB and Llama 2 7B is about 13500 MB, so I should not be having issues with memory.

Topic		Replies	Views
Having trouble accelerate on my 2 GPU machine Beginners	0	740	May 24, 2023
Accelarator can't detect my GPUs? 🤗Accelerate	10	1683	March 29, 2024
Cuda Out of Memory with Multi-GPU Accelerate for gemma-2b 🤗Accelerate	1	152	December 22, 2024
`Accelerator.prepare` utilize only one GPU instead of all the 8 available GPUs and raises "CUDA out of memory" 🤗Accelerate	3	2885	July 19, 2024
Hugging face accelerate and torch DDP crash with out-of-memory errors for a model runs fine on a single GPU 🤗Accelerate	3	4571	January 1, 2024

Multi-GPU inference with accelerate

Related topics