How can we enable multi-GPU utilization to prevent the following error?
Minimal example:
from accelerate import Accelerator
from transformers import AutoModelForCausalLM
accelerator = Accelerator()
model = AutoModelForCausalLM.from_pretrained("Salesforce/codegen-6b-mono")
accelerator.prepare(model)
It raises:
OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 10.75 GiB total capacity; 10.08 GiB already allocated; 81.62 MiB free; 10.09 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
The output of accelerate env
:
- `Accelerate` version: 0.21.0
- Platform: Linux-3.10.0-1160.81.1.el7.x86_64-x86_64-with-glibc2.17
- Python version: 3.10.11
- Numpy version: 1.25.0
- PyTorch version (GPU?): 2.0.1 (True)
- PyTorch XPU available: False
- PyTorch NPU available: False
- System RAM: 503.31 GB
- GPU type: NVIDIA GeForce RTX 2080 Ti
- `Accelerate` default config:
- compute_environment: LOCAL_MACHINE
- distributed_type: NO
- mixed_precision: no
- use_cpu: False
- num_processes: 1
- machine_rank: 0
- num_machines: 1
- gpu_ids: all
- rdzv_backend: static
- same_network: True
- main_training_function: main
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: []
We’re using a single machine with 8 GPUs—each with 10GB of memory. Monitoring nvidia-smi
revealed that accelerate
utilize only one of the GPUs and raises the exception above immediately after the memory of this single GPU is fully loaded.