Argmax of Generation Probabilities doesn't match with Generated Sequence Tokens

I am using BART model to make a chatbot. I am doing the following to generate response from the model:

    generated_output = model.generate(
                    input_ids = input_ids,
                    attention_mask = mask,
                    output_scores=True, return_dict_in_generate=True

But, the sequence generated from the ‘sequences’ field:

gen_tokens = generated_output['sequences']
gen_tokens_seq = [tokenizer.decode(g, skip_special_tokens = True) for g in gen_tokens]

And the one generated from argmax(scores)

num_generated_tokens = len(generated_output['scores'])
for i1 in range(0, num_generated_tokens, 1):
    temptensor = generated_output['scores'][i1][0] 
    gen_id = torch.argmax(temptensor).item()
gen_ids = torch.tensor(gen_ids)
gen_ids = gen_ids.view(1, -1)
gen_ids_seq = [tokenizer.decode(g, skip_special_tokens = True) for g in gen_ids]

are not the same. I need the logit vector for the generated sequence of tokens by model.generate(), but it’s not returning what I expect it to return. What can I do to get the logit values for the “sequence” returned by model.generate()?

Okay I think I found it.

I had to set

num_beams=1, do_sample=False

manually inside model.generate() to get the correct logit values. These values are set by default according to huggingface documentation, but still, I had to manually set them.

Please correct me if I am wrong on this.