Which BERT for Semantic Change (Methodology)

Hello :hugs: Community.

I’d like to build a BERT LM to measure semantic change in semantic fields comprised of specific tokens (comparing texts that are apart a few decades) for a bachelor thesis. However I am unsure as to which of the models (BertForTokenClassification, BertForMaskedLM etc.) I should be using, since I am aiming for embedding single tokens. For now what I would like to do is:

  1. Load pre-trained bert model from huggingface (done)
  2. Continue training with custom input texts (still unsure how to do that)
  3. Create embeddings for target tokens

Following the this Tutorial section 4.3 says that the vectors BERT creates might not be suitable for similarity metrics (a problem I’ll probably try to tackle with graphs later). There is a github that implements a BERT Clustering and computes different judgement scores. They appear to be using torch tensors. At this point I’ve been through so many articles, tutorials and videos it feels like I have lost the plot. Which model would be plausible to use?

I think I got it: MLM (masked language model) can be a suitable model for this task.