Small Doubt :- Is there any mistake while finding results from DeBERTa model for SNLI Dataset

Hi everyone.

from transformers import DebertaTokenizer, DebertaForSequenceClassification
import torch


max_length = 512

premise = "I do not love you"
hypothesis = "I love you"

hg_model_hub_name = "microsoft/deberta-base-mnli"

tokenizer = DebertaTokenizer.from_pretrained(hg_model_hub_name)
model = DebertaForSequenceClassification.from_pretrained(hg_model_hub_name)

tokenized_input_seq_pair = tokenizer.encode_plus(premise, hypothesis,
                                                  max_length=max_length,
                                                  return_token_type_ids=True, truncation=True)

input_ids = torch.Tensor(tokenized_input_seq_pair['input_ids']).long().unsqueeze(0)
# remember bart doesn't have 'token_type_ids', remove the line below if you are using bart.
token_type_ids = torch.Tensor(tokenized_input_seq_pair['token_type_ids']).long().unsqueeze(0)
attention_mask = torch.Tensor(tokenized_input_seq_pair['attention_mask']).long().unsqueeze(0)

outputs = model(input_ids,
                attention_mask=attention_mask,
                token_type_ids=token_type_ids,
                labels=None)
# Note:
# "id2label": {
#     "0": "entailment",
#     "1": "neutral",
#     "2": "contradiction"
# },

predicted_probability = torch.softmax(outputs[0], dim=1)[0].tolist()  # batch_size only one

print("Premise:", premise)
print("Hypothesis:", hypothesis)
print("Entailment:", predicted_probability[0])
print("Neutral:", predicted_probability[1])
print("Contradiction:", predicted_probability[2])

The output is :-

Premise: I do not love you
Hypothesis: I love you
Entailment: 0.9993366599082947
Neutral: 0.0004206844314467162
Contradiction: 0.00024267268599942327

While calculating accuracy using above code on SNLI Dataset I am getting 29% which should not be…

Resluts from SNLI dataset

              precision    recall  f1-score   support

           0       0.04      0.04      0.04      3329
           1       0.82      0.81      0.82      3235
           2       0.02      0.03      0.02      3278

    accuracy                           0.29      9842
   macro avg       0.30      0.29      0.29      9842
weighted avg       0.29      0.29      0.29      9842

Can anybody guide me where i am doing mistake or how should i approach to find the entailment, neutral, contradiction prediction from DeBERTa model

@DeBERTa @lewtun can you please help??

hey @akshat-suwalka i think the reason why you’re getting a much lower score on the snli dataset is due to a misalignment between the labellabel_id mappings in the model and dataset.

to explain what i mean, note that the config.json of the deberta model has the following mappings:

  "id2label": {
    "0": "CONTRADICTION",
    "1": "NEUTRAL",
    "2": "ENTAILMENT"
  },
  "label2id": {
    "CONTRADICTION": 0,
    "ENTAILMENT": 2,
    "NEUTRAL": 1
  }

while the snli dataset has the contradiction and neutral labels swapped:

id2label = {0: 'entailment', 1: 'neutral', 2: 'contradiction'}
label2id = {'contradiction': 2, 'entailment': 0, 'neutral': 1}

To fix this you can use the Dataset.align_labels_with_mapping function (docs):

hg_model_hub_name = "microsoft/deberta-base-mnli"

model = DebertaForSequenceClassification.from_pretrained(hg_model_hub_name)
config = model.config

snli = load_dataset("snli")
snli_aligned = snli.align_labels_with_mapping(label2id=config.label2id, label_column="label")

hth!