Setting up my custom device map for a LLM

SanjuEpic · June 30, 2023, 11:35am

Hello community, greetings!
This is my first post on forums, and i have a quick doubt on

“”"let’s say i want to load bigscience/bloom-1b7 model, and i have just enough GPU RAM to fit the entire model except the lm_head. Therefore write a custom device_map as follows:

device_map = {
“transformer.word_embeddings”: 0,
“transformer.word_embeddings_layernorm”: 0,
“lm_head”: “cpu”,
“transformer.h”: 0,
“transformer.ln_f”: 0,
}
“”"
This part has been taken from here: Quantize Transformers models (huggingface.co)

How can one know the architecture of that particular model before actually trying to setup a custom device_map.
If i want to try help(model) to get the architecture of it, in general i wont be able to load this model in the first hand on a GPU, if this is the case then how can i know which particular layers i can offload between cpu and gpu to quantize it?

Let me know if there is any adequacy in my question.
Thanks!

DesmondChoy · August 10, 2023, 1:01pm

Yeah that’s a question that I’ve been wondering about too - how do you find the layers for each model? I looked up bigscience/bloom-1b7 · Hugging Face and searched for “lm_head” and wasn’t able to find any results.

SanjuEpic · August 10, 2023, 3:22pm

One naive solution I found out was to get the device map of that model by running it on a larger gpu machine and store it somewhere and later on use that device map according to the smaller gpu architecture for cpu and gpu offloading.

model.hf_device_map function will help for knowing the device map of a hugging face model

nitinsurya · January 29, 2025, 1:24pm

This post has some ideas: Error: The model weights are not tied. Please use the `tie_weights` method before using the `infer_auto_device` function, even after adding model.tie_weights()

Maybe this could work:

import transformers
from accelerate import init_empty_weights
from transformers import AutoConfig, AutoModelForCausalLM

config = AutoConfig.from_pretrained("chaoyi-wu/PMC_LLAMA_7B")
with init_empty_weights():
    model = AutoModelForCausalLM.from_config(config)

from accelerate import infer_auto_device_map, init_empty_weights
device_map = infer_auto_device_map(model)

Topic		Replies	Views
ValueError: model.embed_tokens.weight doesn't have any device set 🤗Transformers	5	6592	December 29, 2023
Anywhere where I can read more about the `device_map` kwarg in `from_pretrained`? Beginners	2	13814	January 5, 2024
Running 70b models on retail GPU Beginners	0	3229	August 4, 2023
Load a large model to multipe, specific GPUs (without CUDA_VISIBLE_DEVICES) 🤗Transformers	0	168	November 22, 2024
Infer_auto_device_map returns empty 🤗Accelerate	2	3248	March 15, 2023

Setting up my custom device map for a LLM

Related topics