Move model with device_map="balanced" to CPU

abhinavkulkarni · June 20, 2023, 2:00pm

Hi,

I have a large model that I am unable to fit into GPU, so I am loading it as follows:

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, AutoConfig

kwargs = {"device_map": "balanced", "torch_dtype": torch.float16}
max_memory = {0: "10GiB", "cpu": "99GiB"}

model_name = 'facebook/opt-6.7b'
config = AutoConfig.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name, config=config, max_memory=max_memory, **kwargs)
tokenizer = AutoTokenizer.from_pretrained(model_name, fast=True)

Now, when I try to move the model back to CPU to free up GPU memory for other processing, I get an error:

model = model.to('cpu')
torch.cuda.empty_cache()

I get:

NotImplementedError: Cannot copy out of meta tensor; no data!

How can I free up GPU memory?

When I check model.hf_device_map, I see that some layers are on GPU while others are on CPU.

Thanks!

jhilgart22 · February 5, 2024, 8:38pm

I’m also running into this with device_map=‘auto’

Topic		Replies	Views
Why am I out of GPU memory despite using device_map="auto"? 🤗Accelerate	3	17593	March 18, 2024
How to use GPU when using transformers.AutoModel DeepSpeed	0	1684	February 3, 2024
Loading model directly to GPU omitting RAM Beginners	6	63	March 28, 2025
Need help performance issues transformers.AutoModelForCausalLM.from_pretrained( 'mosaicml/mpt-7b-instruct' Beginners	0	930	June 12, 2023
Model is not properly moved to GPU memory with torch.no_grad() Beginners	5	4780	August 24, 2022

Move model with device_map="balanced" to CPU

Related topics