Compute VRAM size for Text2Text text generation

Hello,

How would I go about calculating / estimating the VRAM that I need to fine-tune models such as flan-t5-large or mt5-large for seq2seq text generation while taking into consideration the input and output sequence size? let us assume I have an input size of 1000 and a target text size of 2000. The model I want to use is MT5-large. How do I compute the amount of GPU memory needed with a minimal batch_size of 1? How would this increase with every increment to the batch-size?
I tried using mt5-small on a dual-GPU setup with 2 x 20 GB but had a memory overflow despite mixed precision and batch-size of 1 due to the number of prediction steps. When I reduced the target text size to a fee dozen tokens only, everything worked perfectly. A few dozen tokens are simply not enough…

1 Like