Replicating SQuAD results on T5

luckyt · December 7, 2022, 7:53pm

Hi, I’m trying to replicate the SQuAD experiment in the T5 paper. I’m following the paper’s recommended hyperparameters for finetuning:

AdaFactor optimizer
Batch size 128 (I’m doing 16 per GPU on 8xRTX 3090 GPUs)
2^18 steps for fine-tuning (which is around 300 epochs)
Max sequence length 512
Learning rate 0.001

I’m running the following:

run_seq2seq_qa.py --model_name_or_path t5-base --dataset_name squad --context_column context --question_column question --answer_column answers --do_train --do_eval --per_device_train_batch_size 16 --optim adafactor --learning_rate 0.001 --num_train_epochs 300 --evaluation_strategy epoch --max_seq_length 512 --predict_with_generate --output_dir /tmp/t5_squad/ --overwrite_output_dir

After 4 epochs, the validation Exact Match score is 79.054 and F1 is 86.895. After 4 epochs, the model starts to overfit and the performance decreases. However, the paper reports 85.44 EM and 92.08 F1 score on T5-base (Table 14).

Has anyone been able to reproduce the official paper results or am I missing anything?

cs32963 · January 14, 2023, 11:54am

Hi, I’m also having trouble replicating the SQuAD results.

I used t5-base and google/t5-v1_1-base checkpoint for tuning on SQuAD. Using AdaFactor with lr 0.001 and gradient accumulation, I got 79.4 EM and 87.81 F1, which are close to your results.

I also tried other hyper-parameter settings and get the best result of 84.68 EM and 91.56 F1(AdaFactor lr 8e-5, batch_size 16 with no gradient accumulation), which are still a little lower than the reported 85.44 EM and 92.08 F1.

I’m still trying to find what went wrong.

cs32963 · January 17, 2023, 9:09am

Setting scale_parameter=True in Adafactor results in 84.2 EM and 91.1 F1

Topic		Replies	Views
T5 finetuning metrics not improving 🤗Transformers	1	341	June 20, 2023
Learning rate and Data size 🤗Transformers	1	607	April 2, 2022
How to use AdaFactor on TPU? Beginners	0	342	August 19, 2021
Finetuning ByT5 with a batch size of 1 on T4 GPU 🤗Transformers	0	589	June 30, 2022
Fine-tune T5-small but lower performance Models	0	1407	April 21, 2022

Replicating SQuAD results on T5

Related topics