Fine-Tuning Flan-T5-Small: Challenges and Unexpected Results
Problem Description
I am fine-tuning the Flan-T5-Small model on a custom dataset using Hugging Face’s transformers
library. Despite following recommended practices, the results are not as expected. Here are the key issues and the steps I have taken:
Fine-Tuning Setup
Training Arguments
training_args = TrainingArguments(
output_dir=output_dir,
run_name=f"flan-t5-finetuning-{datetime.now().strftime('%Y%m%d_%H%M%S')}",
per_device_train_batch_size=1,
gradient_accumulation_steps=8,
num_train_epochs=3,
save_steps=500,
logging_steps=50,
save_total_limit=1,
fp16=False,
dataloader_num_workers=0,
gradient_checkpointing=True,
report_to=[],
resume_from_checkpoint=False,
max_grad_norm=1.0,
optim='adamw_torch',
torch_compile=False,
learning_rate=2e-5,
weight_decay=0.05,
lr_scheduler_type="cosine",
warmup_steps=100
)
Dataset Format
The dataset consists of a single column (Combined
), where each row contains:
- An instruction, extracted topic, and the expected output.
- Example:
Input: Write a detailed summary of Autism in 23 sentences, each with about 22 words. Topic: Autism Output: [Expected output text]
Observations
-
Training Loss Behavior:
- Loss starts high (e.g., 25) and drops rapidly to 1.5 within the first 450 iterations.
- Suggests the model learns quickly but may not generalize well.
-
Unstable Results:
- Outputs are repetitive or lack coherence despite low training loss.
Troubleshooting Steps
- Reduced learning rate to
2e-5
with a cosine decay scheduler. - Increased weight decay to
0.05
for better regularization. - Introduced gradient checkpointing to manage memory constraints and enable larger models.
Example Training Progress
[ 12/1356 00:09 < 20:15, 1.11 it/s, Epoch 0.02/3]
Step Training Loss
50 23.682300
100 26.042400
150 12.555300
200 10.276300
Actual Output :
Input Text:
Instructions: Write a detailed summary on Topic in 23 sentences, each with about 22 words.
Topic: Autism
Output:
Generated Output: Instructions: Write a detailed summary on Topic in 23 sentences, each with about 22 words. Topic: Autism
Please help me understand where I am going wrong ?