I am using unsloth to fine-tune llama 3.1 8b on colab using the Eli5 dataset. The trainer is way too slow to even build. This should be done in a couple minutes but it is taking hours. Here is my SFTTrainer code:
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=ds,
dataset_text_field="text",
max_seq_length=max_seq_length,
dataset_num_proc=2,
packing=False,
args=TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
warmup_steps=10,
max_steps=100,
learning_rate=1e-4,
fp16=not is_bfloat16_supported(),
bf16=is_bfloat16_supported(),
logging_steps=1,
optim="adamw_8bit",
weight_decay=0.01,
lr_scheduler_type="linear",
seed=3407,
output_dir="outputs",
),
)
I am getting 15 examples/sec speed and I have not even started the training process yet.