Hello community, greetings!
This is my first post on forums, and i have a quick doubt on
“”"let’s say i want to load bigscience/bloom-1b7 model, and i have just enough GPU RAM to fit the entire model except the lm_head. Therefore write a custom device_map as follows:
device_map = {
“transformer.word_embeddings”: 0,
“transformer.word_embeddings_layernorm”: 0,
“lm_head”: “cpu”,
“transformer.h”: 0,
“transformer.ln_f”: 0,
}
“”"
This part has been taken from here: Quantize Transformers models (huggingface.co)
How can one know the architecture of that particular model before actually trying to setup a custom device_map.
If i want to try help(model) to get the architecture of it, in general i wont be able to load this model in the first hand on a GPU, if this is the case then how can i know which particular layers i can offload between cpu and gpu to quantize it?
Let me know if there is any adequacy in my question.
Thanks!