Why my model behaves differently at each load?


The fine-tuned BertModel on NER task behaves differently at each load of the .bin file.


When the model finishes training:

  • Input
    I am John and I work at Hugging-Face
  • output
    [(I, O), (am, O), (John, PER), (and, O), (I, O), (work, O), (at, O), (Hugging-Face, ORG)]

After stopping the notebook session and loading the model:

  • Input
    I am John and I work at Hugging-Face
  • output
    [(I, PER), (am, PER), (John, PER), (and, PER), (I, ORG), (work, PER), (at, O), (Hugging-Face, PER)]


  • Colab Pro +
  • Transformers == 4.23.1
  • Torch == 1.12.1


I am currently facing an issue with my NER model based on BertModel from the Transformers library and inspired from the BertForTokenClassification code base.

Indeed, the issue is the following, after training and evaluating my model I end up with a well-performing model with a validation accuracy greater than 96%. The problem is that when I save the model and load it for inference it gives different results, yes it gives different predictions (bad) each time it is loaded. It should be noted that when the model has finished training the predictions are good, but when I stop the notebook session and start another one and then load my best model saved, it behaves differently.

Model Architecture:

class NerBertModel(nn.Module):
  def __init__(self, id2label, label2id, num_labels):
    super(PhenoBertModel, self).__init__()
    self.id2label = id2label
    self.label2id = label2id
    self.num_labels = num_labels
    self.bert = Config.MODEL
    classifier_dropout = (
            Config.CONFIG.classifier_dropout if Config.CONFIG.classifier_dropout is not None else Config.CONFIG.hidden_dropout_prob
    self.dropout = nn.Dropout(classifier_dropout)

    self.classifier = nn.Linear(Config.CONFIG.hidden_size, num_labels)

  def forward(self, 
              input_ids: Optional[torch.Tensor] = None, 
              attention_mask: Optional[torch.Tensor] = None, 
              token_type_ids: Optional[torch.Tensor] = None,
              labels: Optional[torch.Tensor] = None):
    outputs = self.bert(input_ids, attention_mask)

    sequence_output = outputs[0]
    sequence_output = self.dropout(sequence_output)
    logits = self.classifier(sequence_output)

    loss = None
    if labels is not None:
      loss_fct = nn.CrossEntropyLoss()
      loss = loss_fct(logits.view(-1, self.num_labels), labels.view(-1))
    return loss, logits

Saved the model using:

torch.save(model.state_dict(), Config.MODEL_PATH)

Loaded the model using:

model = NerBertModel(id2label, label2id, num_labels=len(id2label))

  Config.MODEL_PATH, # model.bin file


The same problem occurs also when using the standard NER model BertForTokenClassfication from the Transformers library directly while saving it and loading as follows:

# Save best model

# Load the model

The seed function I am using

def seed_torch(seed=42):
    torch.backends.cudnn.benchmark = False
    torch.backends.cudnn.deterministic = True

Problem Fixed

I fixed it. The problem is that each time I run the notebook, unique_labels contains the labels in a different order compared to the previous notebook session, so I end up with different encoding of the labels and this is due to the set() method, which I used to get unique labels and then encode them dynamically as shown in the snippet of code below:

unique_labels = set([label for label in data["token_labels"].values for label in labels])
label2id = {k: v for v, k in enumerate(unique_labels)}
id2label = {v: k for v, k in enumerate(unique_labels)}

It should be noted that the problem occurred even when:

  1. Using the standard BertForTokenClassification model from the Hugging-Face transformers library while using save_pretrained() and from_pretrained(). However, It is recommended to save and load the best model using save_pretrained() and from_pretrained() respectively when it comes to a model based on the Hugging-Face transformers library.
  2. Running the notebook on the local host.

So just try to avoid using set() or sort its output before label encoding, thus you always end up with the same label encoding.


I am confused can you please explain it again

labels = [i.split() for i in df['labels'].values.tolist()]
unique_labels = set()

for lb in labels:
        [unique_labels.add(i) for i in lb if i not in unique_labels]
labels_to_ids = {k: v for v, k in enumerate(unique_labels)}
ids_to_labels = {v: k for v, k in enumerate(unique_labels)}

this is my code for unique_labels how should I modify it?
I understood your second part that uses save_pretrained and try to store the order of labels should be the same, but how?

Ok, it worked. Thanks for your solution @NouRed
Update code would look like -

labels = [i.split() for i in df['labels'].values.tolist()]
unique_labels = set()

for lb in labels:
    [unique_labels.add(i) for i in lb if i not in unique_labels]

# Sort the unique_labels
sorted_unique_labels = sorted(unique_labels)

# Create labels_to_ids and ids_to_labels mappings
labels_to_ids = {label: idx for idx, label in enumerate(sorted_unique_labels)}
ids_to_labels = {idx: label for idx, label in enumerate(sorted_unique_labels)}
1 Like