BART - Input two sentences?

anon67423238 · November 22, 2021, 1:09pm

Hi! I would like to understand a bit better how BART handles a multiple sentences.
I got that I can encode two sentences with tokenizer(sent_a, sent_b).
My first sentence contains a <mask> symbol that is to be filled. However, I noticed that the second sentence - as opposed to the first sentence - isn’t part of the output (which is okay in my case, but I wonder why). In addition, it seems that the second sentence doesn’t really have an influence on how the <mask> token is replaced, so it’s not really considered as a context, and seems to even confuse the model. Can I actually input two sentences if I’m aiming for the mask filling task? Would it make sense to finetune for it?

input_ids = tokenizer.encode(sent_a, sent_b, return_tensors="pt").to('cuda:0')
tokenizer.batch_decode(model.generate(input_ids))

jbmaxwell · June 12, 2022, 9:29pm

I was wondering something similar. In my case, when I send in two sentences as something like:

<s> sentence_A </s><s> sentence_B </s>

the return is actually reversed:

</s><s> sentence_B </s><s> sentence_A </s>

The masks in the returned sentences have been infilled appropriately, so there isn’t necessarily any problem, per se, but I wasn’t expecting the return to be formatted this way (it actually took a while for me to realize what was going on).
If this is a consistent/intended pattern then I’m happy to just reverse the sentences when I get them back, but I wanted to verify that there isn’t something going wrong in my model that needs to be addressed.

jbmaxwell · June 13, 2022, 2:49pm

Okay, this is a bit embarrassing… I took the forum examples of how to format “noisy” Bart data too literally and wound up swapping all of my input sentences. So actually Bart was just doing its job!

Topic		Replies	Views
BART generate() output not related to input Intermediate	1	814	February 17, 2022
Infilling multiple mask spans with BartForConditionalGeneration Intermediate	0	409	July 12, 2022
Finetuning BART on a multi-input sequence to sequence task 🤗Transformers	0	733	September 22, 2021
Multi-decoder text generation with BART 🤗Transformers	0	625	June 7, 2021
BART tokenizer adds two EOS (</s>) tokens? Beginners	0	296	March 25, 2022

BART - Input two sentences?

Related topics