Correct way to use greedy_search for BART model

Hello,

I am trying to use greedy_search for the BART-base model. But I seem to be running in multiple problems as listed below:

If I just use the greedy_search method as we use generate, it gives me a ValueError: One of input_ids or input_embeds must be specified

from transformers import AutoModelForSeq2SeqLM, AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-base")
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-base")

inputs = ["Complete this sentence with something:", "Tell a joke: A donkey walks into a bar"]
input_ids = tokenizer(inputs, return_tensors="pt", padding=True)

outputs = model.greedy_search(**input_ids, max_new_tokens=12, return_dict_in_generate=True, output_scores=True)
print(tokenizer.batch_decode(outputs.sequences, skip_special_tokens=False))

Now I understand from here that this doesn’t work for some reason for the Encoder-Decoder type models.
So following the link above, I tried writing a slightly modified code for BART as shown below:

from transformers import AutoModelForSeq2SeqLM, AutoModelForCausalLM, AutoTokenizer
import torch

model = AutoModelForSeq2SeqLM.from_pretrained("facebook/bart-base")
tokenizer = AutoTokenizer.from_pretrained("facebook/bart-base")

inputs = ["Complete this sentence with something:", "Tell a joke: A donkey walks into a bar"]
input_ids = tokenizer(inputs, return_tensors="pt", padding=True)

encoder_outputs = model.get_encoder()(input_ids.input_ids)
decoder_input_ids = torch.ones_like(input_ids.input_ids)[:, :1] * model.config.decoder_start_token_id

model_kwargs = {"encoder_outputs": encoder_outputs}

outputs = model.greedy_search(decoder_input_ids, max_new_tokens=15, return_dict_in_generate=True, output_scores=True, **model_kwargs)
print(tokenizer.batch_decode(outputs.sequences, skip_special_tokens=False))

Now this code executes fine and gives me some output, however, this output does not match the output when I use

model.generate(outputs = model.generate(input_ids=input_ids.input_ids, attention_mask=input_ids.attention_mask, do_sample=False, return_dict_in_generate=True, output_scores=True, max_new_tokens=12, num_beams=1)

which is equivalent to a greedy search if I understand correctly.

Can someone please help me pin down the exact behavior of the greedy_search method in context of BART models? Any help would be greatly appreciated!