Need Help Improving Similarity Scores for Follow-up Detection Using BERT or similar

hsupadrasta · May 26, 2024, 3:13am

I’m currently working on a project to automatically detect follow-up needs in emails using natural language processing (NLP). The goal is to identify phrases or sentences in email content that indicate a request for a response or action. To achieve this, I am using Sentence-BERT to calculate the similarity between email content and a set of predefined follow-up-related phrases.
Steps i followed

Remove stop words from the email content.
Generate bigrams and trigrams from the cleaned email text.
Encode the n-grams using Sentence-BERT.
Compute cosine similarity between the encoded n-grams and predefined follow-up phrases.
Determine if any similarity score exceeds a threshold (e.g., 0.7) to classify the email as needing a follow-up.
Seed Phrases:

“please respond”
“kindly reply”
“awaiting your response”
“follow up”

Example Email Content: “Can you please get back to me with your feedback?”

Despite following the above steps, the similarity scores between the n-grams and the seed phrases are lower than expected. For example, the similarity score between “please respond” and “please get back” is only 0.44. I was hoping for higher scores given the semantic similarity between these phrases.

How to achieve higher similarity or is achieving higher should even be the goal.

MattiLinnanvuori · May 26, 2024, 10:47am

You can try a better model. Also, I doubt you need to remove stop words, generate bigrams, trigrams and encode the n-grams.

Topic		Replies	Views
Guidance on Optimizing Text Similarity and Reporting with Transformers and Advanced NLP Techniques 🤗Transformers	0	33	November 7, 2024
Can Similarity Sentence Returns the Similarity Content? 🤗Transformers	0	324	April 27, 2023
Refine BERT to pay more attention to key words Intermediate	0	320	November 24, 2023
Looking for model to evaluate potential responses Beginners	1	212	June 13, 2023
Restricting BERT scores; Methods to counter high confidence in classification of short non-word-like-phrases to labels Beginners	0	467	May 27, 2021

Need Help Improving Similarity Scores for Follow-up Detection Using BERT or similar

Related topics