Hi there! Once I’ve imported a BERT model from HuggingFace, is there a way to convert a sequence of encoded tokens into BERT’s raw embeddings without contextualizing them using self-attention, or otherwise extract the raw embedding for a given token?
2 Likes
Try this.
I think the apis change a bit between models so take a look before you copy paste
model = DistilBertForTokenClassification.from_pretrained(
"distilbert-base-cased", num_labels=self.num_labels
)
word_embeddings = model.distilbert.embeddings.word_embeddings(["my token ids here"])
word_embeddings_with_positions = model.distilbert.embeddings(["my token ids here"])
3 Likes
This got me there! Thank you so much.