Solved!
Apparently the ‘input_ids’, ‘attention_mask’, ‘token_type_ids’ all needs to be of size
(batch_size, sequence_length) , so when I used
.unsqueeze(0)
instead of
.squeeze(0)
it worked.
In addition, the tokenizer should be added with the parameter: is_split_into_words=True ,
to avoid ambiguity with a batch of sequences.
Hope it helps to others stuck on the same thing.