Hello all, currently I am trying to DPO for Phi-3-mini 128k. The DPOTrainer for the model takes more than 30GB. I am using Kaggle notebook with 2 Tesla T4 GPU’s (each of 15 GB each). Here is my training configuration
training_params = transformers.TrainingArguments(
output_dir="./results",
per_device_train_batch_size=1,
gradient_accumulation_steps=1,
warmup_steps=2,
learning_rate=5e-5,
fp16=False,
logging_steps=4,
optim="paged_adamw_8bit",
lr_scheduler_type="cosine",
report_to="tensorboard",
gradient_checkpointing=True,
)
trainer = DPOTrainer(
model,
ref_model=None,
args=training_params,
beta=0.01,
train_dataset=raw_datasets["train"],
tokenizer=tokenizer,
peft_config=lora_config,
max_prompt_length=7000,
max_length=8192,
reference_free=True,
)
Am I doing anything wrong here ? Can someone explain how to resolve this ?