Using BERT or other transformers for Semantic Neighborhood density

Hi everyone,

I am wanting to calculate the semantic neighborhood density (SND) of words within a sentence. So far, I have used Word2Vec to get word vectors and got the cosine similarities from that, but this is essentially treating words as dictionary entries rather than considering the larger context. Would anyone be able to direct me to documentations or ways that I could use a transformer model to calculate SND in context ?

Thank you so much