Need Help Improving Similarity Scores for Follow-up Detection Using BERT or similar

I’m currently working on a project to automatically detect follow-up needs in emails using natural language processing (NLP). The goal is to identify phrases or sentences in email content that indicate a request for a response or action. To achieve this, I am using Sentence-BERT to calculate the similarity between email content and a set of predefined follow-up-related phrases.
Steps i followed

  1. Remove stop words from the email content.
  2. Generate bigrams and trigrams from the cleaned email text.
  3. Encode the n-grams using Sentence-BERT.
  4. Compute cosine similarity between the encoded n-grams and predefined follow-up phrases.
  5. Determine if any similarity score exceeds a threshold (e.g., 0.7) to classify the email as needing a follow-up.
    Seed Phrases:

“please respond”
“kindly reply”
“awaiting your response”
“follow up”

Example Email Content: “Can you please get back to me with your feedback?”

Despite following the above steps, the similarity scores between the n-grams and the seed phrases are lower than expected. For example, the similarity score between “please respond” and “please get back” is only 0.44. I was hoping for higher scores given the semantic similarity between these phrases.

How to achieve higher similarity or is achieving higher should even be the goal.

You can try a better model. Also, I doubt you need to remove stop words, generate bigrams, trigrams and encode the n-grams.