I’m trying to use a BERT-based model (jeniya/BERTOverflow · Hugging Face) to do Next Sentence Prediction. This is essentially a BERT model that has been pretrained on StackOverflow data.
Now, to pretrain it, they should have obviously used the Next Sentence Prediction task. But when I do a AutoModelForNextSentencePrediction.from_pretrained("jeniya/BERTOverflow")
, I get a warning message saying:
Some weights of BertForNextSentencePrediction were not initialized from the model checkpoint at jeniya/BERTOverflow and are newly initialized: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
Now, I get that the message is telling me that the NSP head does not come with this model and so has been initialized randomly. My question is, if they have published pre-trained a BERT model on some custom data, shouldn’t they also have used an NSP head for their pretraining objective? If so, where did that head go? Did they just throw it away?
If so, how would I go about getting this custom model to work for the task of NSP? Should I pre-train the whole goddamn thing again, but this time not throw away the NSP head? Or can I simply do something like use AutoModel
, and extract the [CLS]
token representation, and put a MLP on top of that and train it with a few examples to do NSP? The former would be infeasible given the compute requirements. I feel like the latter is just wrong. Am I missing something?
Any help would be greatly appreciated! Thank you!