How to predict the memory requirements for a given model?

Hi! I was wondering if there was any exact way or a rule of thumb to determine the GPU memory requirement for training a model given the input and output sequence length (I’m specifically interested in seq2seq models), the configuration and the model type. Moreover, is there any good practice to decrease such requirement?

