How to predict the memory requirements for a given model?

Hi! I was wondering if there was any exact way or a rule of thumb to determine the GPU memory requirement for training a model given the input and output sequence length (I’m specifically interested in seq2seq models), the configuration and the model type. Moreover, is there any good practice to decrease such requirement?
Thanks.

1 Like