Hi, I am having this error and I don’t know what it means. Can someone explain?
This code triggers error:
model = LlamaForCausalLM.from_pretrained(
base_model,
#load_in_8bit=True,
torch_dtype=torch.float16,
device_map=device_map,
quantization_config=quantization_config,
)
device_map = {
“transformer.word_embeddings”: 0,
“transformer.word_embeddings_layernorm”: 0,
“lm_head”: “cpu”,
“transformer.h”: 0,
“transformer.ln_f”: 0,
}
while using device map add “model.embed_tokens”. In my case I have set everything to “cpu” since I don’t have any GPUs, you can set your’s as required.
you may need to add “model.layers” and “model.norm” also
In llama2-70b, we have encountered the same issue.
Does anyone know why this error occurs in the device_map, or perhaps have a clear solution?
If anyone has detailed documentation regarding llama2’s tokenizer, sharing it would be greatly appreciated.
A device_map specifies where to place each of the individual parameters of the model. If a model supports it, it’s advised to use device_map="auto", which will automatically determine where to place each of the layers (using the priority of GPUs > CPUs > disk). Read more about it here: Handling big models for inference.
In your case, as you specify the device_map of the individual parameters, it will complain if you don’t include ALL of the model’s parameter names. In this case, the token embedding matrix of the LlamaForCausalLM model is called model.embed_tokens as can be seen here. Hence, the error will be raised because this was not specified in your device_map. I see you’ve specified transformer.word_embeddings, but that’s not the name of the token embedding matrix for this model.