Encoding sentence pair with BERT cause ValueError: not enough values to unpack (expected 2, got 1)

abigail-gs · November 13, 2022, 11:15am

Solved!

Apparently the ‘input_ids’, ‘attention_mask’, ‘token_type_ids’ all needs to be of size
(batch_size, sequence_length) , so when I used

.unsqueeze(0)

instead of

.squeeze(0)

it worked.
In addition, the tokenizer should be added with the parameter: is_split_into_words=True ,
to avoid ambiguity with a batch of sequences.

Hope it helps to others stuck on the same thing.

Topic		Replies	Views
ValueError: too many values to unpack (expected 2) when using BertTokenizer 🤗Transformers	6	8509	July 13, 2021
Pre-training: ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided [] Beginners	3	3584	February 4, 2025
RoBERTa for Sentence-pair classification Models	2	1968	April 23, 2024
ValueError: You should supply an encoding or a list of encodings to this method that includes input_ids, but you provided ['tokens', 'id', 'space_after', 'ner_tags', 'ner_ids'] Intermediate	2	2416	April 21, 2023
BERT encoding for batch of Sentence Pairs raise IndexError: index out of range in self Beginners	1	398	November 16, 2022