BERT Next Sentence Prediction: How to do predictions?

msamogh · June 17, 2021, 1:58pm

Let’s say I have a pretrained BERT model (pretrained using NSP and MLM tasks as usual) on a large custom dataset.

The way I understand NSP to work is you take the embedding corresponding to the [CLS] token from the final layer and pass it onto a Linear layer that reduces it to 2 dimensions. Then, you apply a softmax on top of it to get predictions on whether the pair of sentences are consecutive or not.

Now, the published pretrained model I have does not include this “NSP head”, so I have to train one myself. How do I do this? Since the only parameters I presume I’ll need to tweak are the ones of the linear layer, will a small dataset be enough for this?

Why is the convention to throw away this NSP head? Isn’t it a useful thing to publish for others to use?

nicir · September 26, 2022, 3:16pm

Have you found a solution for this?

cog · September 27, 2022, 1:48am

hi @nicir .

Next Sentence Prediction (NSP) just train each sentence’s relation.

What is your train task?
Using pre-trained model to fine-tuning at some next sentence prediction task? or model doesn’t trained NSP loss, only trained with MLM loss?
Either both way, this class with hf document will be helpful to you : BertForNectSentencePrediction

in that code, there are BertOnlyNSP in class.

Hope this helps.

regards.

nicir · September 27, 2022, 8:50am

Hi @cog,

I’ve got a custom dataset of sentences and their following sentences with labels of 0 and 1, looking like this:

Dataset({
features: [‘input_ids’, ‘token_type_ids’, ‘labels’],
num_rows: 61
})

where ‘input_ids’ are the two sentences seperated by a [SEP] token like this:

[102, 196, 10094, 30925, 232, 23180, 1555, 853, 16778, 2517, 7526, 223, 24805, 17632, 2380, 125, 5367, 17157, 30881, 223, 5438, 1556, 304, 125, 249, 10063, 2292, 6200, 2379, 11550, 216, 199, 196, 10094, 30925, 232, 24557, 14324, 103, 478, 7895, 195, 510, 18411, 406, 5367, 17157, 106, 212, 21300, 2272, 394, 30489, 105, 24889, 215, 513, 128, 387, 190, 3218, 10260, 9470, 16707, 4635, 566, 103]

The ‘token_type_ids’ are zeros for the first sentence and ones for the second sentence.

The ‘labels’ are just 0s or 1s.

Now I want to train a NSP-Task based on my custom dataset (that will be larger in the end).

Can I use the Trainer-API to easily train a BertForNextSentencePrediction?

Thank you in advance

cog · September 27, 2022, 11:12am

yes. you can train BertForNextSentanceprediction with trainer.

just define model, and use Trainer.

Important thing is make Training arguments to fit BertForNextSentanceprediction class args.

Also, make sure dataloader output shape fit to model’s require input data.

Thank to nielsr and HF team, Here some tutorial about Fine-tuning bert.

There code use Trainer for fine-tuning BERT, so you can use similar function, method etc… to your works.

regards.

nicir · September 29, 2022, 2:54pm

By Training arguments you mean the size of the tensor, e.g. of the logits?

Or do you mean that the dataset for Fine-Tuning should exactly match the form that given here: BertForNextSentencePrediction? (Like input_ids, attention_mask, output_hidden_states, …)

I will definitely do that tutorial, thank you so much!

Topic		Replies	Views
Next sentence prediction on custom model 🤗Transformers	3	3389	May 14, 2024
Continual pre-training from an initial checkpoint with MLM and NSP Models	4	4283	September 8, 2021
How to train BERT from scratch on a new domain for both MLM and NSP? Models	2	2292	February 6, 2021
Pre-Train BERT (from scratch) Research	43	18992	June 27, 2022
Get sentence ‘B’ predicted, given sentence ‘A’ using Next Sentence Prediction model 🤗Transformers	1	921	March 30, 2022

BERT Next Sentence Prediction: How to do predictions?

Related topics