I’m trying to replicate the basic OPT examples from the documentation and I keep getting a CUDA is out of memory error. I tried using low_cpu_mem_usage=True since that has been a solution on other models, but it doesn’t make a difference.
Code is basic:
from transformers import AutoModelForCausalLM, AutoTokenizer
model = AutoModelForCausalLM.from_pretrained(“facebook/opt-30b”, torch_dtype=torch.float16).cuda()
And an example error is:
I know it’s a large model, but pytorch reserving 43 GiB seems high. All of the solutions I can find on outside forums wouldn’t seem to work with this model type (running in smaller batches, clearing memory mid run, or using koila wrappers). Any help much appreciated!