I have a custom text dataset, which I want BERT to get acquainted with. My final goal is not to run any supervised task (it is actually to act as a starting point to get sentence embeddings from S-BERT.
I just want to continue doing the unsupervised training on my dataset. How do I do this?
So far, I have come across two possible candidates in the documentation for this:
-
BertForPreTraining
(the self-explanatory name led me to this) -
BERTForMaskedLM
(as used in this blog post).
Can both of them be used for this purpose? Is one more attuned to my purpose? Have you previously tried to do something like this? Any additional suggestions would also be very helpful.
Thank you