Transformer generate function got low GPU utilization

Hi all,
I am now doing a bi-level training using T5 transformer. And I need to use model.generate to generate some synthetic data to train the second model for each step.
However, when I use generate function during training, it made my GPU utilization go down to 30% from 90%. Even if I set the beam width to 1, it is still 35%.
Doesn’t model.generate function use GPU? Why the utilization is so low.

Hi, I am having the same problem. Did you find the answer? Thanks