CUDA OOM in the course `Fine-tune a model with GRPO`

gauravjain14 · March 6, 2025, 6:26am

I am following the tutorial in Practical Exercise: Fine-tune a model with GRPO - Hugging Face NLP Course and am using the following configuration.

I have a 4xT4 16GB machine (64GB total memory) and even with CUDA_VISIBLE_DEVICES=0,1,2,3 this program is running out of memory. Am I missing something because this is strange that a 0.5B parameter model is erroring out with OOM.

# 3. Configure training
training_args = GRPOConfig(
    output_dir="output",
    num_train_epochs=3,
    per_device_train_batch_size=8,
    gradient_accumulation_steps=2,
    per_device_eval_batch_size=2,
    logging_steps=10,
)

# 4. Initialize and train   
trainer = GRPOTrainer(
    model="Qwen/Qwen2.5-0.5B-Instruct",
    args=training_args,
    train_dataset=dataset,
    reward_funcs=reward_func,
)

trainer.train()

gauravjain14 · March 6, 2025, 1:33pm

I think it’s got more to do with the data pipeline.

I am loading this dataset = load_dataset("mlabonne/smoltldr", split="train") which has about 2000 samples. If I reduce this to about 50 samples, it works fine, i.e. doesn’t report CUDA out of memory. Will need to see what the training steps are doing.

VP21 · March 9, 2025, 1:26pm

How do we reduce the samples from 2000 to 50?, can you pls. let me know.
Also, is there any other changes in the code?

Topic		Replies	Views
Training out of memory 🤗Transformers	0	217	July 18, 2024
CUDA OOM while saving the model 🤗Transformers	2	1202	June 13, 2023
CUDA OOM error when using data-distributed mode on AWS p4d.24xlarge instance Beginners	7	338	December 4, 2024
Increasing VRAM Usage with Transformers Trainer Leads to OOM on GPUs 🤗Transformers	2	1049	March 29, 2024
Facing OOM error in TROCR finetuning on custom data Beginners	0	255	May 15, 2023

CUDA OOM in the course `Fine-tune a model with GRPO`

Related topics