I have sentences with additional features for each (a 5 dims vector of floats) and a label for each (True or False).
So an example of sample:
“hello have a nice day”, [0.2, 0.1, 0.6, 0.7, 0.2], True
I have a classification task on my data.
I want to finetune Bert to take input_sentence, concat the vec to the last hidden layer, and predict label_sentence.
How can it be done? I dont find any code sample of concat additional data to the last layer before classification
One way you could do it is by precomputing the last hidden states’ CLS token embedding for each of the text in your dataset and storing it in a numpy array. Then you could concatenate this array with your desired additional features to accomplish classification task.
On a side note, you may want to rescale your additional features to the scale of bert embeddings.