Hello,
I am trying to use the t5-small greedy search according to the example listed here. This simple operation results in a CUDA out of memory error on a V100 (32GB VRAM).
While using the generate function with the same model and same inputs works fine.
Any idea what could be causing this issue? Help is much appreciated!