Help regarding understanding Llama output without sampling

anique · April 17, 2023, 12:31am

Can someone help me understand what is happening if I call Llama without the sampling boilerplate. Intuitively, without sampling, the output should just be the most probable next words. However when I call the model, the output does not start with the sentence start token. Additionally, the words seem to be mixed up.

e.g.
Following is my input:

What is an alpaca?

I run the following to get the output:

    inp = tokenizer(prmpt, return_tensors='pt')
    output = model.base_model.model(inp['input_ids'].to(device), inp['attention_mask'].to(device))

Following is the output of tokenizer.decode(torch.argmax(output.logits[0], dim=1).tolist()):

' Below is the examplekalaca?\n'

This is not what I would expect. Additionally, inspecting the fine-tune script, the loss is generated as the crossentropy between the full input and output. I would have expected loss on the next token.

Topic		Replies	Views
AI model (llama) is producing garbage output Beginners	2	250	January 9, 2025
Llama model outputs strange words Beginners	0	130	December 1, 2024
Results of model.generate are different for different batch sizes of the decode-only model Beginners	6	6007	April 14, 2024
Llama 2 repeats its prompt as output without answering the prompt 🤗Transformers	3	3618	September 30, 2024
Extending the tokenizer affects model generation Intermediate	3	169	December 19, 2024

Help regarding understanding Llama output without sampling

Related topics