OPT Memory problem

Hi!

I’m trying to replicate the basic OPT examples from the documentation and I keep getting a CUDA is out of memory error. I tried using low_cpu_mem_usage=True since that has been a solution on other models, but it doesn’t make a difference.

Code is basic:

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
model = AutoModelForCausalLM.from_pretrained(“facebook/opt-30b”, torch_dtype=torch.float16).cuda()

And an example error is:

I know it’s a large model, but pytorch reserving 43 GiB seems high. All of the solutions I can find on outside forums wouldn’t seem to work with this model type (running in smaller batches, clearing memory mid run, or using koila wrappers). Any help much appreciated!

The facebook/opt-30b model takes 60GB of memory in FP16 precision.

hi.

if your machine install multiple GPUs, load model with Data Parallel or Distributed Data parallel shall be help.

regards.