How can I load large models like google/mt5-xl on a GPU

bakrianoo · April 29, 2022, 10:07pm

Hi all

Did any able to load large models like google/mt5-xl on a GPU instance?

Although the model size does not exceed 15GB, I failed to train it (with batch-size=1) using a GPU instance with A6000 (48 GB GPU memory)

Did anyone do it before?

Any explanation why it consumes GPU memory more than its actual size?

Thank you all

marshmellow77 · April 30, 2022, 4:04am

Hi Abu

Loading Transformer models requires significantly more GPU memory than just the model size:

For starters, you need at least 2x the model size, once for the initial weights and once to load the checkpoint
Apart from the model parameters, there are also the gradients, optimizer states, and activations taking memory, so the actual memory usage will be likely more than 4x the model size

In order to train such large models you will have to use some sort of model parallelism, as explained in this blog post.

You can find an example on how to train GPT-J (-24GB) here: amazon-sagemaker-examples/train_gptj_smp_notebook.ipynb at main · aws/amazon-sagemaker-examples · GitHub
(this example uses Amazon SageMaker to distribute the model over several GPUs).

Hope that helps.

Cheers
Heiko

bakrianoo · April 30, 2022, 3:45pm

Thanks, @marshmellow77 for your detailed answer.

Topic		Replies	Views
General question about large model loading 🤗Accelerate	2	917	November 28, 2024
How is memory managed when loading a model? Beginners	2	6213	July 4, 2023
Impossible to use flan-t5-xxl in a batch-transform job Amazon SageMaker	3	1149	May 23, 2023
Recommend an instance for MPT-7B and MPT-30B inference Amazon SageMaker	2	405	July 19, 2023
How much memory required to load T0pp Models	4	3709	October 20, 2021