Batch_size, seq_length = input_shape ValueError: too many values to unpack (expected 2) Transformer Sentence Similarity Classification

I encountering a problem when doing Sentence Similarity Classification Task Using Transformer.

It happens when I was about to train my model, I load and process my dataset in this way, and this is what the training dataset element shape looks like:

but when I train the model, it says:

ValueError: Caught ValueError in replica 0 on device 0.
Original Traceback (most recent call last):
File “/home/fanshaoqi/anaconda3/envs/tf2/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.py”, line 85, in _worker
output = module(*input, **kwargs)
File “/home/fanshaoqi/anaconda3/envs/tf2/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File “/home/fanshaoqi/anaconda3/envs/tf2/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1527, in _call_impl
return forward_call(*args, **kwargs)
File “/home/fanshaoqi/anaconda3/envs/tf2/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py”, line 1564, in forward
outputs = self.bert(
File “/home/fanshaoqi/anaconda3/envs/tf2/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File “/home/fanshaoqi/anaconda3/envs/tf2/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1527, in _call_impl
return forward_call(*args, **kwargs)
File “/home/fanshaoqi/anaconda3/envs/tf2/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.py”, line 967, in forward
batch_size, seq_length = input_shape
ValueError: too many values to unpack (expected 2)

So I try this to decrease the dimension
the training element looks like this now:
{‘input_ids’: <tf.Tensor: shape=(64,), dtype=int64, numpy=
array([ 101, 2769, 4761, 6887, 8024, 852, 3221, 800, 812, 3680, 2399,
6963, 833, 2214, 6407, 8024, 3680, 2399, 6963, 833, 6158, 2837,
2461, 8024, 6821, 1922, 2694, 6111, 749, 8024, 1728, 711, 800,
812, 6375, 872, 2828, 102, 1315, 886, 800, 812, 1038, 6387,
872, 2828, 2124, 2372, 6822, 1343, 8024, 800, 812, 3297, 5303,
6820, 3221, 833, 6158, 2803, 1139, 1343, 511, 102])>,
‘token_type_ids’: <tf.Tensor: shape=(64,), dtype=int64, numpy=
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])>,
‘attention_mask’: <tf.Tensor: shape=(64,), dtype=int64, numpy=
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])>,
‘labels’: <tf.Tensor: shape=(), dtype=int64, numpy=1>}

and the model was able to train, but its extremely slow!!!

And because this is a sentence similarity classification task, so i suppose the batch_size should be 2? but it always 1. what it should be like?

Dont really understand why, Been struggling for a long time, Someone pls help. Thanks!


This is how i process the dataset

I think the first training element shape is correct, because thats what official document presents, but it just couldnt run the model.


Sorry i can only embed one image cause i’m new user

image

image
This is how i config the model


training time is too long

Are you using GPU?

1 Like

yes, it A100 with 40G memory, is this the problem?

Just that I can’t see where you have loaded the model to cuda.

1 Like

Wait is this the reason that it runs so slow? This is how I load the model.

I need to check on this, I thought it would automatically employ the model on the cuda devices when I load it. Thanks.

And this is what nvidia-smi shows when i already load the model. It seems it didnt run on gpu.

I dont understand, when I run the program on my own server, when i load the model, nvidia-smi shows gpu already has it.

wait it wasnt, I need to load the model on cuda devices first, Thanks a lot!!! Really appreciate that!!

Man thanks it worked out, now it runs in a correct speed, but the very first question still confuse me, about the data shape problem, This is run in a incorrect way, the input_id shapes should be (batch_size, seq_length), but it just kept telling me too many values to unpack, so i can only squeeze it to run in a wrong way, do you have any idea about this? Big Thanks!!!

Try to print input shape and you will know why you got the value error. Too many values to unpack occurs when say your list has 3 items but you give only 1 or 2 variables to unpack.

Eg. my_list = [ ‘item1’, ‘item2’, ‘item3’]

Var1, var2 = my_list. Gives too many values to unpack error.

As regards the batch size, it’s not 2 just because there are two examples to compare. Batch is the number of examples you want to process/compute parallelly, in this case compute similarity. It can be 2 or 4 or 64 as is in your training arguments, to the extent your GPU cpu can handle.

1 Like

Really appreciate for your suggestion man, I realize where i got it wrong.
IAlready make it all work, Thanks again, Have a great one!!! :grin: :grin: :grin: :people_hugging: :people_hugging:

1 Like

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.