Hey everyone…I was doing a personal project on News Summarization using Flan T5 base model on gopalkalpande/bbc-news-summary hf dataset…so I’m using 95th percentile of the token length that covers 95 percent of my input and outputs … below is my training setup
from transformers import Seq2SeqTrainer,Seq2SeqTrainingArguments
from transformers.data.data_collator import default_data_collator
import time
from transformers import DataCollatorForSeq2Seq
from transformers import get_scheduler
from torch.optim import AdamW
data_collator = DataCollatorForSeq2Seq(
tokenizer=tokenizer,
model=model,
label_pad_token_id=-100
)
output_dir = f'./news-sum-training-{str(int(time.time()))}'
train_args = Seq2SeqTrainingArguments(
output_dir=output_dir,
num_train_epochs=10,
eval_strategy="epoch",
auto_find_batch_size=True,
evaluation_strategy="epoch",
learning_rate=1e-4,
weight_decay=0.01,
logging_steps=10,
fp16=False,
predict_with_generate=True,
save_strategy="epoch",
load_best_model_at_end=True
)
trainer=Seq2SeqTrainer(
model=peft_model,
args=train_args,
train_dataset=tokenized_train_ds,
eval_dataset=tokenized_val_ds,
tokenizer=tokenizer,
data_collator=data_collator,
compute_metrics=compute_metrics
)
and this is my losses and rouge scores …
and I’m doing LoRA fine tuning so that config is as below
from peft import LoraConfig, get_peft_model, TaskType
lora_config = LoraConfig(
r=32,
lora_alpha=32,
target_modules=["q", "v"],
lora_dropout=0.05,
bias="none",
task_type=TaskType.SEQ_2_SEQ_LM
)
peft_model = get_peft_model(model, lora_config)
peft_model.print_trainable_parameters()
now i need help to decrease the validation loss further and increase rouge scores to a acceptable level…pls anyone help me or guide what should I do in this situation…thanks in advance.