T5 Finetuning not converging

Hi everyone!

I am new in this world of transformers and NLP, and I am having a problem when fine tuning T5 for my specific use case.

What I want to achieve, is that the model receives an input text, and outputs a JSON (as a string) of the relevant information in the text.

There are 3 formats that the model can respond, below are some examples:
Input: Hey, can you give one hundred dollars to John?
Expected Output: ‘{“action”: “T”, “data”: {“name”: “John”, “amount”: 100, “currency”: “USD”}}’

Input: I want to add Benjamin Franklin to my contacts. He has an account on citibank, with number 412389124.
Expected Output: ‘{“action”: “A”, “data”: {“name”: “Benjamin Franklin”, “account_no”: 412389124, “entity”: “Citibank”, “id_num”: null}’

Input: Hey, what’s the weather gonna be tonight?
Expected Output: ‘{“accion”: “N”, “datos”: {}}’

I’ve built a Python script to generate the inputs and labels as random as possible. With that python script, I generated 20000 data points (I can generate less or more of that).

Using T5 as my base model, I’ve trained it using the trainer from pytorch.

Below is my code:

model_name_huggingface = "google/t5-base"

tokenizer = T5Tokenizer.from_pretrained(model_name_huggingface)
model = T5ForConditionalGeneration.from_pretrained(model_name_huggingface)

Then, after I tokenize my dataset.

batch_size = 16

training_args = Seq2SeqTrainingArguments(
    output_dir="models/chimi-mt5-base",
    evaluation_strategy="steps",
    eval_steps=100,
    logging_strategy="steps",
    logging_steps=100,
    save_strategy="steps",
    save_steps=200,
    # learning_rate=1e-4,
    optim="adafactor",
    learning_rate=5e-4,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    predict_with_generate=True,
    weight_decay=0.05,
    save_total_limit=3,
    num_train_epochs=2,
    metric_for_best_model="exact_match",
    # greater_is_better=False,
    load_best_model_at_end=True
)
data_collator = DataCollatorForSeq2Seq(tokenizer=tokenizer, model=base_model)
cer = evaluate.load("cer", module_type="metric")
exact_match = evaluate.load("exact_match", module_type="metric")
import numpy as np

def compute_metrics(eval_pred):
    predictions, labels = eval_pred
    decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)

    # Replace -100 in the labels as we can't decode them.
    labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
    decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

    result = {}

    # Compute CER
    result["cer"] = cer.compute(predictions=decoded_preds, references=decoded_labels)

    # Compute Exact Match
    exact_match_res = exact_match.compute(predictions=decoded_preds, references=decoded_labels, ignore_case=True)
    result["exact_match"] = exact_match_res["exact_match"]

    return {k: round(v, 4) for k, v in result.items()}
trainer = Seq2SeqTrainer(
    model=base_model,
    args=training_args,
    train_dataset=tokenized_chimi_dataset["train"],
    eval_dataset=tokenized_chimi_dataset["validation"],
    data_collator=data_collator,
    tokenizer=tokenizer,
    compute_metrics=compute_metrics
)
result = trainer.train()

That’s the current code I am using to fine tune T5.

The training loss goes down up to 0.054, and never improves.
The validation loss goes down up 0.034, and never improves.
The CER metric goes down up to 0.4875 and never improves after that. But, just to let you know, after the first 100 steps, it already has a CER of 0.583.
The Exact Match Metric goes up to 0.3089, and that already happens after the 600th step.

By testing, I see that it responds in the correct JSON format, and the action is responded correctly normally. But then, the data inside the JSON is not often correct.

What can I do to improve this?
I am stuck on this for a long time, and I am not really sure how to proceed. Any help is appreciated.

Thanks in advance!