How to concatenate additional features to the last layer of Bert

Hi
I have sentences with additional features for each (a 5 dims vector of floats) and a label for each (True or False).
So an example of sample:
“hello have a nice day”, [0.2, 0.1, 0.6, 0.7, 0.2], True
I have a classification task on my data.
I want to finetune Bert to take input_sentence, concat the vec to the last hidden layer, and predict label_sentence.
How can it be done? I dont find any code sample of concat additional data to the last layer before classification

Thanks!

One way you could do it is by precomputing the last hidden states’ CLS token embedding for each of the text in your dataset and storing it in a numpy array. Then you could concatenate this array with your desired additional features to accomplish classification task.

On a side note, you may want to rescale your additional features to the scale of bert embeddings.

Hi,
I want it to be as a wrapping class to BERT so I will be able to use it in trainer, etc

Hi,

This question has been answered here: Combine BertForSequenceClassificaion with Additional Features - #3 by nielsr

@nielsr Thanks, it is working great, just a few questions about ur notebook:

  1. In CustomSequenceClassification, why do you need to call post_init()?
  2. Lets say that in the input, instead of a single sentence I want to insert two sentences (“This is a sentence 1”, “This is a sentence 2”) , how would you do it?

Also, if you could please help with an additional question I just posted it would be highly appreciated…

Thanks

Hi,

  1. In CustomSequenceClassification, why do you need to call post_init()?

The post_init method takes care of initializing all the weights as defined in the _init_weights method of the xxxPreTrainedModel. See e.g. the _init_weights method of BERT here.

  1. Lets say that in the input, instead of a single sentence I want to insert two sentences (“This is a sentence 1”, “This is a sentence 2”) , how would you do it?

You can leverage the tokenizer for that, as it supports pair of sentences besides single sentences. Just like this:

inputs = tokenizer(sentence_1, sentence_2, padding=True, return_tensors="pt")

This will then look like [CLS] sentence_1 [SEP] sentence_2 [SEP] in terms of tokens.

  1. If the line of

init_weights()

would been remove, what will be the effect? This init the weight of the classification head? are they not automatically initilized?

  1. Do should I change to:

ds = Dataset.from_dict({“text”: [[“This is a sentence1”, “This is a sentence2” ]]*100, “extra_data”: np.random.randint(0, 10, size=(100, 5)), “labels”: np.random.randint(0, 2, size=(100,))})
tokenized_ds = ds.map(lambda x: tokenizer(x[“text”][0], x[“text”][1]))
?