`Accelerator.prepare` utilize only one GPU instead of all the 8 available GPUs and raises "CUDA out of memory"

How can we enable multi-GPU utilization to prevent the following error?

Minimal example:

from accelerate import Accelerator
from transformers import AutoModelForCausalLM

accelerator = Accelerator()
model = AutoModelForCausalLM.from_pretrained("Salesforce/codegen-6b-mono")
accelerator.prepare(model)

It raises:

OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 10.75 GiB total capacity; 10.08 GiB already allocated; 81.62 MiB free; 10.09 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The output of accelerate env:

- `Accelerate` version: 0.21.0
- Platform: Linux-3.10.0-1160.81.1.el7.x86_64-x86_64-with-glibc2.17
- Python version: 3.10.11
- Numpy version: 1.25.0
- PyTorch version (GPU?): 2.0.1 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- System RAM: 503.31 GB
- GPU type: NVIDIA GeForce RTX 2080 Ti
- `Accelerate` default config:
	- compute_environment: LOCAL_MACHINE
	- distributed_type: NO
	- mixed_precision: no
	- use_cpu: False
	- num_processes: 1
	- machine_rank: 0
	- num_machines: 1
	- gpu_ids: all
	- rdzv_backend: static
	- same_network: True
	- main_training_function: main
	- downcast_bf16: no
	- tpu_use_cluster: False
	- tpu_use_sudo: False
	- tpu_env: []

We’re using a single machine with 8 GPUs—each with 10GB of memory. Monitoring nvidia-smi revealed that accelerate utilize only one of the GPUs and raises the exception above immediately after the memory of this single GPU is fully loaded.

When doing prepare and training a model, the full model needs to be able to fit into memory properly on one GPU. However the reason you only are using one as accelerate is configured for one. You specified num_processes: 1. It needs to be 8 to use all 8 GPUs you have. + coupled with accelerate launch

@muellerzr, thanks for your response.

When doing prepare and training a model, the full model needs to be able to fit into memory properly on one GPU. However the reason you only are using one as accelerate is configured for one. You specified num_processes: 1 . It needs to be 8 to use all 8 GPUs you have.

We’re not training; only trying to do inference. Should we increase num_processes?

[…] + coupled with accelerate launch

For initiating accelerate in an interactive IPython session, should we use accelerate launch ipython?