Need Help for News Summary fine tuning using flan t5

John6666 · April 5, 2025, 1:21pm

I think it’s better to use the hyperparameter search for serious hyperparameter tuning, but there seem to be a few settings that can be tweaked before that.

Also, AdamW has been deprecated and officially discontinued a while ago, so I think it’s better to use a different optimizer.

github.com/huggingface/transformers

ImportError: cannot import name 'AdamW' from 'transformers'

opened 10:12AM - 25 Mar 25 UTC

closed 05:13PM - 25 Mar 25 UTC

tapoban123

``` from ragatouille import RAGPretrainedModel RAG = RAGPretrainedModel.from_pre…trained("colbert-ir/colbertv2.0") ``` ### Getting the following error when trying to run the above code: My versions: > Python v3.12.4 > RAGatouille v0.0.9 > transformers v4.50.0 ``` Traceback (most recent call last): File "<stdin>", line 1, in <module> File "E:\My Documents\Mobile Apps\OneGo\back-end\venv\Lib\site-packages\ragatouille\__init__.py", line 2, in <module> from .RAGPretrainedModel import RAGPretrainedModel File "E:\My Documents\Mobile Apps\OneGo\back-end\venv\Lib\site-packages\ragatouille\RAGPretrainedModel.py", line 14, in <module> from ragatouille.models import ColBERT, LateInteractionModel File "E:\My Documents\Mobile Apps\OneGo\back-end\venv\Lib\site-packages\ragatouille\models\__init__.py", line 2, in <module> from .colbert import ColBERT File "E:\My Documents\Mobile Apps\OneGo\back-end\venv\Lib\site-packages\ragatouille\models\colbert.py", line 11, in <module> from colbert import Trainer File "E:\My Documents\Mobile Apps\OneGo\back-end\venv\Lib\site-packages\colbert\__init__.py", line 1, in <module> from .trainer import Trainer File "E:\My Documents\Mobile Apps\OneGo\back-end\venv\Lib\site-packages\colbert\trainer.py", line 5, in <module> from colbert.training.training import train File "E:\My Documents\Mobile Apps\OneGo\back-end\venv\Lib\site-packages\colbert\training\training.py", line 7, in <module> from transformers import AdamW, get_linear_schedule_with_warmup ImportError: cannot import name 'AdamW' from 'transformers' (E:\My Documents\Mobile Apps\OneGo\back-end\venv\Lib\site-packages\transformers\__init__.py) ```

following is by Hugging Chat.

To improve the performance of your news summarization model using Flan-T5, here are the key steps and considerations based on your thought process:

Data Preprocessing and Tokenization:
- Add Task Prefix: Consider prefixing your input texts with “summarize:” or a similar instruction to help the model understand the task clearly.
- Adjust Token Length: Evaluate if the current max_length is too restrictive. Increasing it slightly might allow the model to capture more context, especially in longer articles.
Model Configuration:
- Experiment with LoRA Rank: Try increasing the LoRA rank from 32 to perhaps 64 or 84. This could enhance the model’s ability to capture information, but monitor for overfitting.
- Learning Rate and Scheduler: Adjust the learning rate (e.g., try 1e-3) and consider implementing a learning rate scheduler to optimize the training process.
Training Parameters:
- Increase Epochs: Extend the number of training epochs to 15 or more to allow the model more time to learn, ensuring thorough monitoring to prevent overfitting.
- Data Collator Settings: Review and possibly adjust settings within DataCollatorForSeq2Seq to enhance data handling and processing efficiency.
Regularization Techniques:
- Dropout Adjustment: Increase the dropout parameter in LoRA from 0.05 to 0.1 to add more regularization.
- Gradient Clipping: Implement gradient clipping to prevent exploding gradients and stabilize training.
Training Strategy:
- Gradient Accumulation: Use gradient accumulation steps to effectively increase the batch size without exceeding hardware limits.
- Early Stopping: Incorporate early stopping to halt training when improvement stalls, preventing overfitting.
Evaluation and Metrics:
- Expand Metrics: While focusing on ROUGE scores, consider incorporating additional evaluation metrics for a comprehensive performance analysis.
Hyperparameter Tuning:
- Grid Search: Conduct a grid search to identify optimal hyperparameters, including learning rate, weight decay, and LoRA configurations.

By systematically addressing these areas, you can enhance your model’s performance, potentially leading to lower validation loss and improved ROUGE scores. Monitoring each change closely will help determine the most effective adjustments for your specific dataset and task.

Topic		Replies	Views
T5 model for summarization far from SOTA results Models	0	1351	July 2, 2021
Finetuning T5 for Summarisation - Poor results Intermediate	1	543	April 28, 2024
Summarization: Is finetune_trainer.py accepting length arguments correctly? Beginners	9	2327	December 19, 2020
Use Pretrained T5 for Summarization Beginners	3	641	July 2, 2021
The model summarizes all the data to the same answer after training Models	0	249	July 2, 2023

Need Help for News Summary fine tuning using flan t5

Related topics