T5 evaluation via Trainer `predict_with_generate` extremely slow on TPU?

+1 I have a similar issue when I fine-tune with GPU. The training takes no long, however predictions on development set takes too long.