How much memory is needed for mbart-large-cc25?

I tried to finetune without freezing but a 16GB V100 ran out of memory with batch size 1, max source length 512.

I am trying to run the mbart En-Ro example and I get CUDA OOM even with max_len 64, n_train 5000 and bs 1 (using Google colab with 15GB gpu P8).
Anyone managed to run it with under 16 GB?

1 Like