Training out of memory

daminho · July 18, 2024, 8:12am

I’m training the Qwen2 - 500m for classification now, my dataset is Reddit post. Somehow when fine-tuning, it keeps saying that I’m out of memory for CUDA device. I even use floating point 16-bit but it’s still giving the OOM error

Here is my code for model and trainer

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModelForSequenceClassification
from transformers import TrainingArguments
import torch


tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-0.5B")
model = AutoModelForSequenceClassification.from_pretrained("Qwen/Qwen2-0.5B", num_labels=2)
model.half()
print(model)

model.to(device)

training_args = TrainingArguments(
    output_dir = 'kaggle/input/output_dir',
    do_train=True,
    do_eval=False,
    eval_strategy="epoch",
    num_train_epochs = 3,
    per_device_train_batch_size = 1,
    per_device_eval_batch_size = 1,
    warmup_steps = 100,
    weight_decay = 0.01,
    eval_accumulation_steps = 1
)

trainer = Trainer(
    model,
    training_args,
    train_dataset=train_dataset,
    eval_dataset=val_dataset,
    tokenizer=tokenizer,
)
trainer.train()

I just use batch size 1 and also eval accumulation steps = 1. The GPU I use is P100 / T4 (two options provided by Kaggle).

Please help meeeeeeeeee! <3

Topic		Replies	Views
Always getting RuntimeError: CUDA out of memory with Trainer 🤗Transformers	10	6920	April 4, 2024
CUDA Out of Memory while fine-tuning even with LoRA Models	6	3248	April 12, 2024
Solving "CUDA out of memory" when fine-tuning GPT-2 🤗Transformers	0	1409	January 6, 2022
Out of memory error when using trainer & output_hidden_states 🤗Transformers	0	706	January 10, 2023
CUDA Out of Memory Error SFTTrainer 🤗Transformers	1	132	February 16, 2025

Training out of memory

Related topics