`Accelerator.prepare` utilize only one GPU instead of all the 8 available GPUs and raises "CUDA out of memory"

Nadav-Timor · July 13, 2023, 9:06pm

How can we enable multi-GPU utilization to prevent the following error?

Minimal example:

from accelerate import Accelerator
from transformers import AutoModelForCausalLM

accelerator = Accelerator()
model = AutoModelForCausalLM.from_pretrained("Salesforce/codegen-6b-mono")
accelerator.prepare(model)

It raises:

OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 10.75 GiB total capacity; 10.08 GiB already allocated; 81.62 MiB free; 10.09 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The output of accelerate env:

- `Accelerate` version: 0.21.0
- Platform: Linux-3.10.0-1160.81.1.el7.x86_64-x86_64-with-glibc2.17
- Python version: 3.10.11
- Numpy version: 1.25.0
- PyTorch version (GPU?): 2.0.1 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- System RAM: 503.31 GB
- GPU type: NVIDIA GeForce RTX 2080 Ti
- `Accelerate` default config:
	- compute_environment: LOCAL_MACHINE
	- distributed_type: NO
	- mixed_precision: no
	- use_cpu: False
	- num_processes: 1
	- machine_rank: 0
	- num_machines: 1
	- gpu_ids: all
	- rdzv_backend: static
	- same_network: True
	- main_training_function: main
	- downcast_bf16: no
	- tpu_use_cluster: False
	- tpu_use_sudo: False
	- tpu_env: []

We’re using a single machine with 8 GPUs—each with 10GB of memory. Monitoring nvidia-smi revealed that accelerate utilize only one of the GPUs and raises the exception above immediately after the memory of this single GPU is fully loaded.

muellerzr · July 13, 2023, 9:48pm

When doing prepare and training a model, the full model needs to be able to fit into memory properly on one GPU. However the reason you only are using one as accelerate is configured for one. You specified num_processes: 1. It needs to be 8 to use all 8 GPUs you have. + coupled with accelerate launch

Nadav-Timor · July 13, 2023, 9:59pm

@muellerzr, thanks for your response.

When doing prepare and training a model, the full model needs to be able to fit into memory properly on one GPU. However the reason you only are using one as accelerate is configured for one. You specified num_processes: 1 . It needs to be 8 to use all 8 GPUs you have.

We’re not training; only trying to do inference. Should we increase num_processes?

[…] + coupled with accelerate launch

For initiating accelerate in an interactive IPython session, should we use accelerate launch ipython?

sritank · July 19, 2024, 1:20am

Don’t know if your issue was resolved but you can check your accelerate config file to make sure it is configured to use all 8 GPUs. Set os.environ[‘CUDA_VISIBLE_DEVICES’] = ‘0,1,2,3,4,5,6,7’ to indicate all 8 GPUs. If using a torch model (instead of transformers), make sure to prepare model for data parallelization, i.e. model = torch.nn.DataParallel(model). Accelerate will then be able to utilize all 8 GPUs

Topic		Replies	Views
Multi-GPU inference with accelerate Beginners	0	1713	October 19, 2023
Accelerate throws CUDA: OOM 🤗Accelerate	0	423	August 22, 2024
Cuda Out of Memory with Multi-GPU Accelerate for gemma-2b 🤗Accelerate	1	128	December 22, 2024
Multi GPU Training with Trainer and TokenClassification Model 🤗Transformers	0	1520	July 21, 2023
OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB. GPU 🤗Transformers	0	513	June 5, 2024

`Accelerator.prepare` utilize only one GPU instead of all the 8 available GPUs and raises "CUDA out of memory"

Related topics