I’m training the Qwen2 - 500m for classification now, my dataset is Reddit post. Somehow when fine-tuning, it keeps saying that I’m out of memory for CUDA device. I even use floating point 16-bit but it’s still giving the OOM error
Here is my code for model and trainer
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM, AutoModelForSequenceClassification
from transformers import TrainingArguments
import torch
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen2-0.5B")
model = AutoModelForSequenceClassification.from_pretrained("Qwen/Qwen2-0.5B", num_labels=2)
model.half()
print(model)
model.to(device)
training_args = TrainingArguments(
output_dir = 'kaggle/input/output_dir',
do_train=True,
do_eval=False,
eval_strategy="epoch",
num_train_epochs = 3,
per_device_train_batch_size = 1,
per_device_eval_batch_size = 1,
warmup_steps = 100,
weight_decay = 0.01,
eval_accumulation_steps = 1
)
trainer = Trainer(
model,
training_args,
train_dataset=train_dataset,
eval_dataset=val_dataset,
tokenizer=tokenizer,
)
trainer.train()
I just use batch size 1 and also eval accumulation steps = 1. The GPU I use is P100 / T4 (two options provided by Kaggle).
Please help meeeeeeeeee! <3