CUDA OOM in the course `Fine-tune a model with GRPO`

I am following the tutorial in Practical Exercise: Fine-tune a model with GRPO - Hugging Face NLP Course and am using the following configuration.

I have a 4xT4 16GB machine (64GB total memory) and even with CUDA_VISIBLE_DEVICES=0,1,2,3 this program is running out of memory. Am I missing something because this is strange that a 0.5B parameter model is erroring out with OOM.

# 3. Configure training
training_args = GRPOConfig(
    output_dir="output",
    num_train_epochs=3,
    per_device_train_batch_size=8,
    gradient_accumulation_steps=2,
    per_device_eval_batch_size=2,
    logging_steps=10,
)

# 4. Initialize and train   
trainer = GRPOTrainer(
    model="Qwen/Qwen2.5-0.5B-Instruct",
    args=training_args,
    train_dataset=dataset,
    reward_funcs=reward_func,
)

trainer.train()
1 Like

I think it’s got more to do with the data pipeline.

I am loading this dataset = load_dataset("mlabonne/smoltldr", split="train") which has about 2000 samples. If I reduce this to about 50 samples, it works fine, i.e. doesn’t report CUDA out of memory. Will need to see what the training steps are doing.

1 Like

How do we reduce the samples from 2000 to 50?, can you pls. let me know.
Also, is there any other changes in the code?

1 Like