Need Help for News Summary fine tuning using flan t5

I think it’s better to use the hyperparameter search for serious hyperparameter tuning, but there seem to be a few settings that can be tweaked before that.

Also, AdamW has been deprecated and officially discontinued a while ago, so I think it’s better to use a different optimizer.


following is by Hugging Chat.

To improve the performance of your news summarization model using Flan-T5, here are the key steps and considerations based on your thought process:

  1. Data Preprocessing and Tokenization:

    • Add Task Prefix: Consider prefixing your input texts with “summarize:” or a similar instruction to help the model understand the task clearly.
    • Adjust Token Length: Evaluate if the current max_length is too restrictive. Increasing it slightly might allow the model to capture more context, especially in longer articles.
  2. Model Configuration:

    • Experiment with LoRA Rank: Try increasing the LoRA rank from 32 to perhaps 64 or 84. This could enhance the model’s ability to capture information, but monitor for overfitting.
    • Learning Rate and Scheduler: Adjust the learning rate (e.g., try 1e-3) and consider implementing a learning rate scheduler to optimize the training process.
  3. Training Parameters:

    • Increase Epochs: Extend the number of training epochs to 15 or more to allow the model more time to learn, ensuring thorough monitoring to prevent overfitting.
    • Data Collator Settings: Review and possibly adjust settings within DataCollatorForSeq2Seq to enhance data handling and processing efficiency.
  4. Regularization Techniques:

    • Dropout Adjustment: Increase the dropout parameter in LoRA from 0.05 to 0.1 to add more regularization.
    • Gradient Clipping: Implement gradient clipping to prevent exploding gradients and stabilize training.
  5. Training Strategy:

    • Gradient Accumulation: Use gradient accumulation steps to effectively increase the batch size without exceeding hardware limits.
    • Early Stopping: Incorporate early stopping to halt training when improvement stalls, preventing overfitting.
  6. Evaluation and Metrics:

    • Expand Metrics: While focusing on ROUGE scores, consider incorporating additional evaluation metrics for a comprehensive performance analysis.
  7. Hyperparameter Tuning:

    • Grid Search: Conduct a grid search to identify optimal hyperparameters, including learning rate, weight decay, and LoRA configurations.

By systematically addressing these areas, you can enhance your model’s performance, potentially leading to lower validation loss and improved ROUGE scores. Monitoring each change closely will help determine the most effective adjustments for your specific dataset and task.