Unable to add additional choices to VisualBertForMultipleChoice,

Hi, I’m working with the brief tutorial given in the VisualBertForMultipleChoice section of the VisualBert page.

This is the code snippet I started with:

from transformers import AutoTokenizer, VisualBertForMultipleChoice
import torch

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
model = VisualBertForMultipleChoice.from_pretrained("uclanlp/visualbert-vcr")

prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
choice0 = "It is eaten with a fork and a knife."
choice1 = "It is eaten while held in the hand."

encoding = tokenizer([[prompt, prompt], [choice0, choice1]], return_tensors="pt", padding=True)

Which is able to run without issue. However, when I add an additional option to the choices:

from transformers import AutoTokenizer, VisualBertForMultipleChoice
import torch

tokenizer = AutoTokenizer.from_pretrained("google-bert/bert-base-uncased")
model = VisualBertForMultipleChoice.from_pretrained("uclanlp/visualbert-vcr")

prompt = "In Italy, pizza served in formal settings, such as at a restaurant, is presented unsliced."
choice0 = "It is eaten with a fork and a knife."
choice1 = "It is eaten while held in the hand."
choice2 = "It is eaten while torn into smaller pieces."

encoding = tokenizer([[prompt, prompt, prompt], [choice0, choice1, choice2]], return_tensors="pt", padding=True)

I get this following error:

TypeError: TextEncodeInput must be Union[TextInputSequence, Tuple[InputSequence, InputSequence]]

I am really stumped as to why this is happening. I’ve experimented with a few different ways to pass in the input sequences, but I continue to get this error. I’m confused because, looking at my input, is it not a list of lists which should be an accepted input?

I’ve spent quite a while on what feels like a simple issue :sweat_smile: I would really appreciate some help on this, thank you in advance!

Finally figured out the solution after referencing this post:

Feeling a bit silly that the issue was simply tokenizer(questions, choices) instead of tokenizer([questions, choices]) as I had been trying!

This was especially confusing since the example given on the VisualBert page follows the tokenizer([questions, choices]) format.

Can someone explain why this issue arose and why our inputs should not be in a list of lists for multiple choice?