CUDA OOM while saving the model

aastha6 · May 18, 2023, 6:10am

I am trying to finetune FLAN-t5-XXL using PEFT’s LORA method.

Training Details -

Dataset size = 6k records,
instance_type = AWS's ml.g5.16xlarge.
batch_size = 2,
gradient_accumulation_steps = 2
learning_rate = 1e-3,
num_train_epochs = 1 # Want to change it to 3 and more but chose 1 for experimenting if training completes or not

Training completes with this output -
{'train_runtime': 1364.2004, 'train_samples_per_second': 0.733, 'train_steps_per_second': 0.183, 'train_loss': 1.278140380859375, 'epoch': 1.0}

But getting CUDA OOM at the point when trying to save the model by trainer.save_model call.

Details of error -
ErrorMessage “OutOfMemoryError: CUDA out of memory. Tried to allocate 40.00 MiB (GPU 0; 22.19
GiB total capacity; 20.34 GiB already allocated; 32.50 MiB free; 20.96 GiB
reserved in total by PyTorch) If reserved memory is >> allocated memory try
setting max_split_size_mb to avoid fragmentation. See documentation for Memory
Management and PYTORCH_CUDA_ALLOC_CONF”

Could anyone help me in sorting this out?

Aastha

aastha6 · May 18, 2023, 1:35pm

Hi @sgugger ! Do you have any suggestion on how to solve this error ?

aastha6 · June 13, 2023, 2:30pm

Found a solution - CUDA OOM error while saving the model · Issue #16 · philschmid/deep-learning-pytorch-huggingface · GitHub

Topic		Replies	Views
CUDA OOM error when using data-distributed mode on AWS p4d.24xlarge instance Beginners	7	339	December 4, 2024
Inquiry Regarding Out of Memory Issue During LoRA Fine-Tuning Models	2	107	May 5, 2025
Getting CUDA out of memory when calling save_pretrained in a script that tries lora training a large language model Beginners	3	1818	November 9, 2023
Repeated training runs out of GPU memory 🤗Transformers	3	255	December 16, 2024
Cuda OOM on 4 A6000s (142 GB of VRAM) even after using Zero3, Qlora, Accelerate, Max_token_length Intermediate	1	94	May 8, 2025

CUDA OOM while saving the model

Related topics