Help regarding understanding Llama output without sampling

Can someone help me understand what is happening if I call Llama without the sampling boilerplate. Intuitively, without sampling, the output should just be the most probable next words. However when I call the model, the output does not start with the sentence start token. Additionally, the words seem to be mixed up.

e.g.
Following is my input:

What is an alpaca?

I run the following to get the output:

    inp = tokenizer(prmpt, return_tensors='pt')
    output = model.base_model.model(inp['input_ids'].to(device), inp['attention_mask'].to(device))

Following is the output of tokenizer.decode(torch.argmax(output.logits[0], dim=1).tolist()):

' Below is the examplekalaca?\n'

This is not what I would expect. Additionally, inspecting the fine-tune script, the loss is generated as the crossentropy between the full input and output. I would have expected loss on the next token.