Fill-mask for BART with variable length

Hey All

i have tried to use BART in the fill-mask pipeline to predict masked tokens, but the output sometimes might be more than one word, and the pipeline does not have an option for that.

In the documentation, I found out this link to mask-filling using BART

from transformers import BartForConditionalGeneration, BartTokenizer

model = BartForConditionalGeneration.from_pretrained("facebook/bart-large", forced_bos_token_id=0)
tok = BartTokenizer.from_pretrained("facebook/bart-large")
example_english_phrase = "UN Chief Says There Is No <mask> in Syria"
batch = tok(example_english_phrase, return_tensors="pt")
generated_ids = model.generate(batch["input_ids"])
assert tok.batch_decode(generated_ids, skip_special_tokens=True) == [
    "UN Chief Says There Is No Plan to Stop Chemical Weapons in Syria"

My question is, how to use this to obtain the top 5 preditions. in fill-mask pipeline, this would be equivalent to

unmasker = pipeline("fill-mask", model=model_name, tokenizer=tokenizer, top_k=10)

The final output I am hoping to obtain is:

["plan to stop the war", .. etc another 5 predictions ]

@sgugger would you be able to lend a hand?