Hi,
I have a large model that I am unable to fit into GPU, so I am loading it as follows:
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig
kwargs = {"device_map": "balanced", "torch_dtype": torch.float16}
max_memory = {0: "10GiB", "cpu": "99GiB"}
model_name = 'facebook/opt-6.7b'
config = AutoConfig.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, config=config, max_memory=max_memory, **kwargs)
tokenizer = AutoTokenizer.from_pretrained(model_name, fast=True)
Now, when I try to move the model back to CPU to free up GPU memory for other processing, I get an error:
model = model.to('cpu')
torch.cuda.empty_cache()
I get:
NotImplementedError: Cannot copy out of meta tensor; no data!
How can I free up GPU memory?
When I check model.hf_device_map
, I see that some layers are on GPU while others are on CPU.
Thanks!