Hi,
I am using transformers library to run inference on several models. I successfully got results running llama2-7b-chat-hf and open-llama-7b-v2 in a single GPU (model.to(device)). But to run 13b models I need to resort to device_map="auto"
. This leads to the following error:
RuntimeError: probability tensor contains either `inf`, `nan` or element < 0
I also tried 7b llama2 and open-llama with device_map=“auto”. I get the same error. I have seen a bunch of posts with the same error but people are getting it for various reasons that are not applicable to me. Would anyone be able to help me figure this out?
Here is how I am loading the model:
tokenizer = AutoTokenizer.from_pretrained(
model_name,
add_prefix_space=True,
use_auth_token=hg_token,
padding_side="left",
legacy=False,
)
model = AutoModelForCausalLM.from_pretrained(
model_name,
use_auth_token=hg_token,
device_map="balanced_low_0"
)
model.eval()
tokenizer.pad_token = tokenizer.eos_token
model.config.pad_token_id = model.config.eos_token_id