Accessing uncontextualized BERT word embeddings

Hi there! Once I’ve imported a BERT model from HuggingFace, is there a way to convert a sequence of encoded tokens into BERT’s raw embeddings without contextualizing them using self-attention, or otherwise extract the raw embedding for a given token?

2 Likes

Try this.
I think the apis change a bit between models so take a look before you copy paste :slight_smile:

            model = DistilBertForTokenClassification.from_pretrained(
                "distilbert-base-cased", num_labels=self.num_labels
            )
            word_embeddings = model.distilbert.embeddings.word_embeddings(["my token ids here"])
            word_embeddings_with_positions = model.distilbert.embeddings(["my token ids here"])
3 Likes

This got me there! Thank you so much.