Hii, I am currently trying to finetune T5 on XSum using TPU and I trained t5-small/base/large for 3 epochs. I have been getting near constant train loss and constant validation loss.
t5-small example
Epoch | Training Loss | Validation Loss
1 | 9.120000 | 12.475000
2 | 9.200000 | 12.475000
3 | 8.960000 | 12.475000
I would like to know if I can get a rough estimate of how many steps or epochs are required to converge each model? The rouge scores are nowhere near the authors.
Thank you