Different results from `model.generate` depending on batch size?

I seem to be getting very different results from model.generate for Question Generation with ProphetNet depending on how many questions I’m generating at once.

from transformers import ProphetNetTokenizer, ProphetNetForConditionalGeneration, ProphetNetConfig

model = ProphetNetForConditionalGeneration.from_pretrained('microsoft/prophetnet-large-uncased-squad-qg')
tokenizer = ProphetNetTokenizer.from_pretrained('microsoft/prophetnet-large-uncased-squad-qg')

fact1 = "Bill Gates [SEP] Microsoft was founded by Bill Gates and Paul Allen on April 4, 1975."

fact2 = "the New Right [SEP] The late 1980s and early 1990s saw the collapse of most of those socialist states that had professed a Marxist–Leninist ideology. In the late 1970s and early 1980s, the emergence of the New Right and neoliberal capitalism as the dominant ideological trends in Western politics championed by United States president Ronald Reagan and British prime minister Margaret Thatcher led the West to take a more aggressive stand towards the Soviet Union and its Leninist allies. Meanwhile, the reformist Mikhael Gorbachev became General Secretary of the Communist Party of the Soviet Union in March 1985 and sought to abandon Leninist models of development towards social democracy. Ultimately, Gorbachev's reforms, coupled with rising levels of popular ethnic nationalism, led to the dissolution of the Soviet Union in late 1991 into a series of constituent nations, all of which abandoned Marxist–Leninist models for socialism, with most converting to capitalist economies."

fact3 = """Paul Lafarguel [SEP] Engels did not support the use of the term Marxism to describe either Marx's or his own views.:12 He claimed that the term was being abusively used as a rhetorical qualifier by those attempting to cast themselves as real followers of Marx while casting others in different terms such as Lassallians.:12 In 1882, Engels claimed that Marx had criticized self-proclaimed Marxist Paul Lafargue by saying that if Lafargue's views were considered Marxist, then "one thing is certain and that is that I am not a Marxist.":12"""

inputs = tokenizer([fact1, fact2, fact3], padding=True, truncation=False, return_tensors="pt")
"""
['what is one example of a person who founded a company?',
 'the collapse of the soviet union led to the rise of which political party?',
 'who was a self - proclaimed marxist?']
"""

# inputs = tokenizer([fact1, fact2], padding=True, truncation=False, return_tensors="pt")
"""
['???',
 'the collapse of the soviet union in the late 1980s saw the collapse of what political party?']
"""

# inputs = tokenizer([fact1], padding=True, truncation=False, return_tensors="pt")
"""
['along with paul allen, who founded microsoft?']
"""

# inputs = tokenizer([fact2], padding=True, truncation=False, return_tensors="pt")
"""
['along with neoliberal capitalism, what political movement emerged in the late 1970s and early 1980s?']
"""

# inputs = tokenizer([fact3], padding=True, truncation=False, return_tensors="pt")
"""
['who did marx criticize in 1882?']
"""

# Generate Summary
question_ids = model.generate(inputs['input_ids'], num_beams=5, early_stopping=True)
tokenizer.batch_decode(question_ids, skip_special_tokens=True)

I commented out the inputs = lines and showed the corresponding outputs in those cases.

I don’t understand what could be causing this. In particular, the results seem best generating one at a time.

Found out about attention_mask, but passing it makes no difference

question_ids = model.generate(inputs['input_ids'], attention_mask=inputs['attention_mask'], num_beams=5, early_stopping=True)