The occurrence of the word fits better in another example

Hi, I am beginner in HuggingFace

I am trying simple word-sense disambiguation using BertTokenizer: ‘bert-base-uncased’. I am compare word ‘rick’ occured in one of wordnet samples and my sample,
When my sample = “band playing rock”, I have results:

tensor([[0.3383]]) he threw a rock at me
tensor([[0.4018]]) that mountain is solid rock
tensor([[0.1466]]) stone is abundant in New England and there are many quarries
tensor([[0.3042]]) he was her rock during the crisis
tensor([[0.2900]]) Thou art Peter, and upon this rock I will build my church
tensor([[0.1766]]) rock is a generic term for the range of styles that evolved out of rock’n’roll.
tensor([[0.0670]]) the ship was rocking
tensor([[0.0478]]) the tall building swayed
tensor([[0.0898]]) She rocked back and forth on her feet
tensor([[0.1587]]) rock the cradle
tensor([[0.1088]]) rock the baby
tensor([[0.0740]]) the wind swayed the trees gently
Sentence: band playing rock
Closest definition: (tensor([[0.4018]]), ‘that mountain is solid rock’)

I don’t know why “rock” near “play” or “playing” (although is even “band”) prefer solid rock instead rock and roll
with roBERTa and deBERTa is similar , except that cosine similarity is above 0.99 for most pairs.
(in 4- 24 hours I can prepare gist with minimal example, if needed)

solution: ‘bert-base-cased’ is a much better:
Sentence: band playing rock
Closest definition: (tensor([[0.8216]]), “rock is a generic term for the range of styles that evolved out of rock’n’roll.”)

but is not quite WSD-friendly: “rock is a generic term for mountain” obvious fit to "“rock is a generic term for the range of styles that evolved out of rock’n’roll.” - a lot of words match
I think, the same will be with negative
Sentence: rock is not a generic term for the range of styles that evolved out of rock’n’roll.
Closest definition: (tensor([[0.9930]]), “rock is a generic term for the range of styles that evolved out of rock’n’roll.”)
(instead of -0.993)

1 Like