Hello there,
I have a short conceptual question. I know can train a masked language model
from scratch. By doing so with huggingface
, I should be able to obtain a model that is very good at … filling the [mask]
token!
But what about the embeddings? are they any good for clustering for instance? Note that I am NOT fine-tuning the MLM model in any way. I am only interested in the embeddings that come from the MLM task itself.
Any suggestions or papers greatly appreciated.
Thanks!