Move model with device_map="balanced" to CPU

Hi,

I have a large model that I am unable to fit into GPU, so I am loading it as follows:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig

kwargs = {"device_map": "balanced", "torch_dtype": torch.float16}
max_memory = {0: "10GiB", "cpu": "99GiB"}

model_name = 'facebook/opt-6.7b'
config = AutoConfig.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, config=config, max_memory=max_memory, **kwargs)
tokenizer = AutoTokenizer.from_pretrained(model_name, fast=True)

Now, when I try to move the model back to CPU to free up GPU memory for other processing, I get an error:

model = model.to('cpu')
torch.cuda.empty_cache()

I get:

NotImplementedError: Cannot copy out of meta tensor; no data!

How can I free up GPU memory?

When I check model.hf_device_map, I see that some layers are on GPU while others are on CPU.

Thanks!

1 Like

I’m also running into this with device_map=‘auto’