Setting up my custom device map for a LLM

Hello community, greetings!
This is my first post on forums, and i have a quick doubt on

“”"let’s say i want to load bigscience/bloom-1b7 model, and i have just enough GPU RAM to fit the entire model except the lm_head. Therefore write a custom device_map as follows:

device_map = {
“transformer.word_embeddings”: 0,
“transformer.word_embeddings_layernorm”: 0,
“lm_head”: “cpu”,
“transformer.h”: 0,
“transformer.ln_f”: 0,
}
“”"
This part has been taken from here: Quantize :hugs: Transformers models (huggingface.co)

How can one know the architecture of that particular model before actually trying to setup a custom device_map.
If i want to try help(model) to get the architecture of it, in general i wont be able to load this model in the first hand on a GPU, if this is the case then how can i know which particular layers i can offload between cpu and gpu to quantize it?

Let me know if there is any adequacy in my question.
Thanks!

1 Like

Yeah that’s a question that I’ve been wondering about too - how do you find the layers for each model? I looked up bigscience/bloom-1b7 · Hugging Face and searched for “lm_head” and wasn’t able to find any results.

One naive solution I found out was to get the device map of that model by running it on a larger gpu machine and store it somewhere and later on use that device map according to the smaller gpu architecture for cpu and gpu offloading.

model.hf_device_map function will help for knowing the device map of a hugging face model

1 Like