I have a short conceptual question. I know can train a
masked language model from scratch. By doing so with
huggingface, I should be able to obtain a model that is very good at … filling the
But what about the embeddings? are they any good for clustering for instance? Note that I am NOT fine-tuning the MLM model in any way. I am only interested in the embeddings that come from the MLM task itself.
Any suggestions or papers greatly appreciated.