CUDA OOM. Is it possible to distribute the usage of memory across 2gpu evenly?

hungchiayu · July 12, 2023, 2:55pm

Hi, I m finetuning a 1B size model with 2x24 GB cuda memory. I am using Trainer to train my model. However, when I encounter the OOM error, it seems that all 24GB of memory in GPU0 is used up but only 7GB is used up in GPU1. Is there a way to distribute the usage of memory across GPU evenly? Right now I m using a batch size of 1, fp16 but still encounter OOM issue.

scorinaldi · August 9, 2023, 9:32pm

I am in the same situation! Did you ever find out how to fix this?

Topic		Replies	Views
Using CUDA unified memory (shared memory) Beginners	0	796	February 11, 2021
Increasing VRAM Usage with Transformers Trainer Leads to OOM on GPUs 🤗Transformers	2	1056	March 29, 2024
Regarding CUDA OOM! Amazon SageMaker	0	497	February 14, 2023
Loading extra memory in GPU 0 using DDP Intermediate	0	386	June 18, 2023
Can't use multi GPU in evaluation from Trainer 🤗Transformers	3	929	December 6, 2023

CUDA OOM. Is it possible to distribute the usage of memory across 2gpu evenly?

Related topics