GPU OOM when training

BramVanroy · October 20, 2021, 6:54pm

My guess would be that you have a specific sample in your dataset that is very long. Your collate function (not shown) might then be padding up to that length. That means that, for instance, your first <9k steps are of size 128x64 (seq_len x batch_size), which does not lead to an OOM. But then, around 9k steps you have a large sequence as a sample, which would (for instance) lead to 384 x 64 input, leading to an OOM.

So check the data distribution of your dataset, and check the collate function. You may want to specify a max_length that is smaller than model max length after all.

Topic		Replies	Views
OOM Issues fine-tune DialogGPT-small Beginners	0	263	February 20, 2023
Always getting RuntimeError: CUDA out of memory with Trainer 🤗Transformers	10	6962	April 4, 2024
CUDA out of memory for Longformer Beginners	6	1271	October 22, 2021
OOM GPU when extracting features into dict according to fine-tuning documentation Beginners	0	374	December 6, 2021
CUDA OOM in the course `Fine-tune a model with GRPO` Course	2	200	March 9, 2025

GPU OOM when training

Related topics