Obtaining word-embeddings from Roberta

First two rules of research: if something is common does not mean that it is SOTA, and what is SOTA in one task is not SOTA in another. We are still seeing feature-based systems outperform LMs in some cases (esp. low resource).

It might work on your task. It might even work well. If your model is specifically finetuned on single words, then it should be fine. But using a pretrained model as-is (without finetuning) to input a single word and get its embedding… I am hesitant to recommend that for the reason discussed above (most LMs are context sensitive so it is senseless to get context free representations out of it). Instead I would recommend word2vec/GloVe.

In a previous post I wrote how you can extract the embeddings from a given word in an input sentence by averaging the subword logits. Generate raw word embeddings using transformer models like BERT for downstream process - Beginners - Hugging Face Forums HTIH