Is BART guaranteed to not mess up unmasked tokens during text infilling?

Hi all, I am following this example in the document to do text infilling with BART. The model is expected to produce the following output given the input:

Input:  UN Chief Says There Is No **<mask>** in Syria
Output: UN Chief Says There Is No **Plan to Stop Chemical Weapons** in Syria

When I look into the generate() method, it appears to me that the output sentence is generated in a token-by-token fashion from the first (UN) to the last (Syria) token by beam search.
So I am not sure how it keeps the unmasked tokens unchanged during the decoding stage, for example, how does it prevent the following case from happening?

Input:  UN Chief Says There Is No **<mask>** in Syria
Output: UN Chief Says There Is No **Plan to Stop Chemical Weapons** in **Iraq**

Is it possible that BART can mess up unmasked tokens (with low probability)? If not, what does it do to prevent that?

Thanks.

I know this is old, but I just stumbled across it so I thought I’d reply in case it helps someone else along the way.

I’ve been struggling with this, too. Since the training is based on “noising” the data, it’s actually always predicting everything, not just the mask tokens. So yes, it can mess up the unmasked tokens, and no, there’s no way to force the unmasked tokens through the model (at least not that I’ve found).

I’m currently experimenting with tweaks on my dataset to do infilling with GPT-2, based on the recent OpenAI paper on “FIM” (fill in the middle) models. https://arxiv.org/pdf/2207.14255.pdf

Basically, they’ve shown that you can do infilling just by shuffling >= 50% of your dataset to place the middle at the end (details in the paper). You have to add some special (sentinel) tokens, so the model knows you’ve moved things around, but the idea is actually super simple. So far I haven’t had much success with multiple-mask infilling this way, thought that’s what I’m working on.

The bonus about this approach, of course, is that the unmasked content is inference-only (i.e., it’s only used as context), so you will always get back your unmasked content as you input it. With Bart you’d probably need a huge dataset to get this to work reliably, from what I’ve gathered.

1 Like