Sentence similarity models not capturing opposite sentences

I have tried different models for sentence similarity, namely:

  • distilbert-base-uncased
  • bert-base-uncased
  • sentence-transformers/all-mpnet-base-v2

I used them together with the packages sentence-similarity and sentence-transformers, which simplify the programming.

I have also tried Universal Sentence Encoders (en_use_md and en_use_cmlm_lg).

However, while these models generally correctly detect similarity for equivalent sentences, they all fail when inputting negated sentences. E.g., these opposite sentences:

  • “I like rainy days because they make me feel relaxed.”
  • “I don’t like rainy days because they don’t make me feel relaxed.”

return a similarity of 0.993 with the model distilbert-base-uncased.

However, sentences that could be considered very similar:

  • “I like rainy days because they make me feel relaxed.”
  • “I enjoy rainy days because they make me feel calm.”

return a similarity of 0.996, which is barely higher.

To my understanding, opposite sentences should have a small similarity, especially when semantics are taken into account.

My question is: Are there any models/approaches that are able to capture the affirmative/negative nature of sentences when calculating similarity?

This relates back to a discussion that was had on Twitter not too long ago. The problem is that “similarity” is ill-defined. You can read through the thread. I did not add much but the discussion between Nils Reimers and Yoav Goldberg is interesting.

It is a good mind exercise to think outside of what you’d want it to mean, and what the models are actually paying attention to. In your example, it is likely that content words receive the most attention and are responsible for a lot of the “meaning” (representation vector). On top of that, lexical overlap inevitably contributes to this value as well. That means that “(rainy) days” and “because they make me feel” already overlap. Yes, in terms of semantics and the whole context the meaning is different, but for the model these sentences are very “similar”. These models do not (necessarily) do sentiment analysis and comparison which seems to be what you are after.

You may wish to look for sentiment models instead.

Thank you for your answer. The Twitter thread was very interesting.

Actually, I’m not really after sentiment analysis; the models also fail with factual sentences with opposite meanings. What I would like to achieve is inferring when two sentences are opposite and assign them a low similarity score (again, this is within the discussion of what “similarity” means, since someone could argue that two sentences that only differ in a “not” or a “don’t” are still very similar).

Hey dmlls,
I’m also interested in similar problem and trying to find a way to distinguish between two opposite sentences. By far, I have created a fine-tuned sentence bert model that have shown improvement but yet far from what we want. I would like to know, have you made any progress in this area or do you have any suggestions that I can try to tackle this issue.
Thank you.

Hi @yzm0034,

I thought of some obvious but simple approaches such as writing a simple regex-based, sentence negation algorithm in order to bootstrap a labeled dataset.

This algorithm would look at sentences containing “easy-to-negate” words/spans, such as modal verbs (can ↔ can’t / cannot, will ↔ won’t / will not, should ↔ shouldn’t / should not, etc.) or auxiliary verbs (is ↔ isn’t / is not, etc.) and replace them by their opposite counterpart.

This idea could be taken further by performing POS tagging, finding the verb (e.g., like) and negating it (don't/doesn't like).

Obviously, things are not always that easy. For example, applying the previous approach to: There are some issues would result in There are not some issues, which sounds odd (one would say There are not any issues or simply There are no issues). Additional regular expressions could be written for these cases.

However, it’s probably not necessary that the generated dataset covers every single case in order to improve over the current baseline :slight_smile:

You can try a sentiment classifier on top of a semantic similarity model. Semantically similar but opposite in sentiment. This is could be an Idea.

This is one of the topics which I am interested in. I would classify this topic as “negation” contextual understanding. We might not have anything in natural language domain other than some datasets in medical domain. let me know if anyone finds one

Hi @hashemi786, we recently published a paper digging into the topic: This is not correct! Negation-aware Evaluation of Language Generation Systems.

The paper introduces the CANNOT dataset, which focuses on negated textual pairs. It currently contains 77,376 samples, of which roughly of them are negated pairs of sentences, and the other half are not (they are paraphrased versions of each other).

We also released the model NegBLEURT, finetuned on the CANNOT dataset, making it significantly more sensitive to negations than its base model. Additionally, I also finetuned a Sentence Transformer: all-mpnet-base-v2-negation (it can be used with the sentence-transformers module). Again, it deals much better with negations.

1 Like

Thanks for your prompt response, I was able to try NegBLEURT, NegMPNet, clinical-assertion-negation-bert.

I was not able to try out all-mpnet-base-v2-negationas the config file/actual model is missing.

Apologies, apparently I never got to upload the model…

You can check again at dmlls/all-mpnet-base-v2-negation. I also added a few examples to the widget so that anyone can play with the model directly from the model page. I recommend comparing scores with its base model (sentence-transformers/all-mpnet-base-v2) to verify that the finetuning did work.