Distance between 2 llama tokens

I am trying to find the distance between 2 words/phrases eg. Chinese Pagodas, Adept

I tried using cosine distance but couldn’t wrangle the fact that the number of tokens betweens the 2 words were different. I am using Llama 3 8b instruct

Adept tokens: {'input_ids': tensor([[128000,     32,  41685]]), 'attention_mask': tensor([[1, 1, 1]])}
Chinese pagoda tokens:{'input_ids': tensor([[128000,  46023,  15117,  14320]]), 'attention_mask': tensor([[1, 1, 1, 1]])}

What would be the most efficient way to find how similar the 2 words are?

1 Like

spacy?