Hi, I am running the llama-3-chat-hf model, using the following codes:
model_name='meta-llama/Meta-Llama-3-8B-Instruct'
model = AutoModelForCausalLM.from_pretrained(model_name, device_map="auto", torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_name)
... # processing data
inputs = tokenizer(prompt, return_tensors='pt').to("cuda") # prompt is my instruction in natural language
output = model.generate(**inputs, max_new_tokens=40)
And the output is
tensor([[ 45147, 31868, 65562, ..., 7566, 128009, 128001]],
device='cuda:0')
This is what I want, but I also need the logits of the Llama-3 model. How can I do this? One solution is calling the model() method, but this method does not return the above output.