What is the behaviour of pipeline's `device_map="auto"`?

Jony7chu · January 17, 2025, 5:00am

Hello,

I have a question about the behaviour of pipelines’ (pipelines) device_map="auto".

Specifically, if I only have one A100-40G GPU and set the batch_size=16, the GPU memory usage is about 90%, and sometimes out of memory (OOM).

If I have two A100-40G GPUs and set the batch_size=16, what behaviour could I expect? The following a or b?
a. GPU0 usage 90%, GPU1 usage 0%, OOM still happens.
b. GPU0 usage < 90%, GPU1 usage > 0%, OOM does not happen.

Alanturner2 · January 18, 2025, 7:28am

Hi, @Jony7chu

Thank you for your question about the behavior of device_map="auto" in pipelines.

The device_map="auto" configuration is designed to partition the model across multiple GPUs automatically, based on the available memory of each device. This can help distribute the memory load and potentially prevent out-of-memory (OOM) issues when running large models.

Given your scenario:

With one A100-40G GPU and batch_size=16:
- The GPU memory usage is high (around 90%), and you might experience OOM errors depending on the model size and batch size.
With two A100-40G GPUs and batch_size=16:
- The behavior will align with option b:
  - device_map="auto" will split the model across both GPUs.
  - GPU0 and GPU1 will both be utilized, with memory usage on GPU0 being less than 90% and GPU1 also showing usage.
  - This reduces the likelihood of OOM errors, as the workload is distributed between the GPUs.

Keep in mind:

The actual memory utilization on each GPU depends on the model architecture and how device_map="auto" partitions the model. Large layers may require more memory on one GPU than the other.
Ensure your software environment is properly configured to enable multi-GPU support (e.g., PyTorch or Transformers libraries, and CUDA).

If OOM errors still occur with two GPUs, you might need to reduce the batch size further or explore other options such as gradient accumulation or model sharding with fine-tuned control.

Let me know if you have further questions!

Topic		Replies	Views
Why am I out of GPU memory despite using device_map="auto"? 🤗Accelerate	3	17729	March 18, 2024
Device_map="auto" in MIG Instance 🤗Transformers	0	537	January 23, 2024
Device_map="auto" Beginners	5	19771	September 25, 2024
Per_device_train_batch_size in model parallelism Beginners	2	36	April 7, 2025
Using device_map='auto' for training 🤗Accelerate	5	35913	January 24, 2025

What is the behaviour of pipeline's `device_map="auto"`?

Related topics