I am following the tutorial in Practical Exercise: Fine-tune a model with GRPO - Hugging Face NLP Course and am using the following configuration.
I have a 4xT4 16GB machine (64GB total memory) and even with CUDA_VISIBLE_DEVICES=0,1,2,3
this program is running out of memory. Am I missing something because this is strange that a 0.5B parameter model is erroring out with OOM.
# 3. Configure training
training_args = GRPOConfig(
output_dir="output",
num_train_epochs=3,
per_device_train_batch_size=8,
gradient_accumulation_steps=2,
per_device_eval_batch_size=2,
logging_steps=10,
)
# 4. Initialize and train
trainer = GRPOTrainer(
model="Qwen/Qwen2.5-0.5B-Instruct",
args=training_args,
train_dataset=dataset,
reward_funcs=reward_func,
)
trainer.train()