Which model can use to pre-train a BERT model?

I am going to do pre-train a BERT model on specific dataset aiming for Sentiment Analysis.
To self-train the model, which method will be better to use: Masked Language Modeling or Next Sentence Prediction? Or maybe there is not specific answer.

2 Likes

Choosing depends on what you want to do.

  • Using masked language modeling is good when you want good representations of the data with which it was trained.
  • Next sentence prediction, or rather causal language modeling (such as GPT), are better when you want to focus in generation.

The course has a section in how to fine-tune a masked language model that could be interesting to you: Main NLP tasks - Hugging Face Course.

3 Likes