ValueError: too many values to unpack (expected 2) when using BertTokenizer

ayalaall · July 13, 2021, 2:03pm

Thanks!
But when I do

encoding = tokenizer([prompt, prompt, prompt], [choice0, choice1, choice2], return_tensors='tf', padding=True)

The encoding looks like the following:

{'input_ids': <tf.Tensor: shape=(3, 23), dtype=int32, numpy=
array([[  101,  5138,  1998,  4638, 16143,  1997,  5653,  2013,  2312,
         3872,  5653,  2545,  1010, 18092,  2015,  1010,  1998, 16728,
         1012,   102,  2051,  2968,   102],
       [  101,  5138,  1998,  4638, 16143,  1997,  5653,  2013,  2312,
         3872,  5653,  2545,  1010, 18092,  2015,  1010,  1998, 16728,
         1012,   102,  3015,   102,     0],
       [  101,  5138,  1998,  4638, 16143,  1997,  5653,  2013,  2312,
         3872,  5653,  2545,  1010, 18092,  2015,  1010,  1998, 16728,
         1012,   102,  3752, 26683,   102]], dtype=int32)>, 'token_type_ids': <tf.Tensor: shape=(3, 23), dtype=int32, numpy=
array([[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
        1],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
        0],
       [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
        1]], dtype=int32)>, 'attention_mask': <tf.Tensor: shape=(3, 23), dtype=int32, numpy=
array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        0],
       [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
        1]], dtype=int32)>}

which as far as I understand is encoded as 3 pairs of texts and not as one question with 3 choices. Namely, wouldn’t I want the encoding to look something like
[101, 5138, ..., 102, 2051, 2968..., 102, 3015, ..., 102, 3752..., 102]
In other words, if I want to fine-tune TFBertForMultipleChoice, don’t I need to encode the prompt and choices as prompt choice0 choice1 choice2?

Thanks,
Ayala

Topic		Replies	Views
Evaluating multiple choices using BertForMultipleChoice Beginners	0	612	June 20, 2021
Encoding sentence pair with BERT cause ValueError: not enough values to unpack (expected 2, got 1) Beginners	1	6808	November 13, 2022
Unable to add additional choices to VisualBertForMultipleChoice, 🤗Transformers	1	174	March 28, 2024
Using `TFBertTokenizer` instead of `BertTokenizer` with `TFBertForQuestionAnswering` 🤗Tokenizers	1	1258	November 15, 2022
Bert Tokenizer Parameter Possible Values 🤗Transformers	0	250	October 8, 2021

ValueError: too many values to unpack (expected 2) when using BertTokenizer

Related topics