I am trying to find the distance between 2 words/phrases eg. Chinese Pagodas, Adept
I tried using cosine distance but couldn’t wrangle the fact that the number of tokens betweens the 2 words were different. I am using Llama 3 8b instruct
Adept tokens: {'input_ids': tensor([[128000, 32, 41685]]), 'attention_mask': tensor([[1, 1, 1]])}
Chinese pagoda tokens:{'input_ids': tensor([[128000, 46023, 15117, 14320]]), 'attention_mask': tensor([[1, 1, 1, 1]])}
What would be the most efficient way to find how similar the 2 words are?