I encountering a problem when doing Sentence Similarity Classification Task Using Transformer.
It happens when I was about to train my model, I load and process my dataset in this way, and this is what the training dataset element shape looks like:
but when I train the model, it says:
ValueError: Caught ValueError in replica 0 on device 0.
Original Traceback (most recent call last):
File â/home/fanshaoqi/anaconda3/envs/tf2/lib/python3.10/site-packages/torch/nn/parallel/parallel_apply.pyâ, line 85, in _worker
output = module(*input, **kwargs)
File â/home/fanshaoqi/anaconda3/envs/tf2/lib/python3.10/site-packages/torch/nn/modules/module.pyâ, line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File â/home/fanshaoqi/anaconda3/envs/tf2/lib/python3.10/site-packages/torch/nn/modules/module.pyâ, line 1527, in _call_impl
return forward_call(*args, **kwargs)
File â/home/fanshaoqi/anaconda3/envs/tf2/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.pyâ, line 1564, in forward
outputs = self.bert(
File â/home/fanshaoqi/anaconda3/envs/tf2/lib/python3.10/site-packages/torch/nn/modules/module.pyâ, line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File â/home/fanshaoqi/anaconda3/envs/tf2/lib/python3.10/site-packages/torch/nn/modules/module.pyâ, line 1527, in _call_impl
return forward_call(*args, **kwargs)
File â/home/fanshaoqi/anaconda3/envs/tf2/lib/python3.10/site-packages/transformers/models/bert/modeling_bert.pyâ, line 967, in forward
batch_size, seq_length = input_shape
ValueError: too many values to unpack (expected 2)
So I try this to decrease the dimension
the training element looks like this now:
{âinput_idsâ: <tf.Tensor: shape=(64,), dtype=int64, numpy=
array([ 101, 2769, 4761, 6887, 8024, 852, 3221, 800, 812, 3680, 2399,
6963, 833, 2214, 6407, 8024, 3680, 2399, 6963, 833, 6158, 2837,
2461, 8024, 6821, 1922, 2694, 6111, 749, 8024, 1728, 711, 800,
812, 6375, 872, 2828, 102, 1315, 886, 800, 812, 1038, 6387,
872, 2828, 2124, 2372, 6822, 1343, 8024, 800, 812, 3297, 5303,
6820, 3221, 833, 6158, 2803, 1139, 1343, 511, 102])>,
âtoken_type_idsâ: <tf.Tensor: shape=(64,), dtype=int64, numpy=
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])>,
âattention_maskâ: <tf.Tensor: shape=(64,), dtype=int64, numpy=
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])>,
âlabelsâ: <tf.Tensor: shape=(), dtype=int64, numpy=1>}
and the model was able to train, but its extremely slow!!!
And because this is a sentence similarity classification task, so i suppose the batch_size should be 2? but it always 1. what it should be like?
Dont really understand why, Been struggling for a long time, Someone pls help. Thanks!