Masked vectors are included in vanilla transformer model

I trained a vanilla Bert Model.
model = BertModel.from_pretrained(‘bert-base-uncased’, num_labels = 4,
output_hidden_states=True,
output_attentions=True)

That I trained. I’m using multi label classification, so I train this model with an extra logit layer attached to it, then summed the results. Code for running it is below

def validation_nn():
    prob_list = []
    input_ids = [[101,4769, 77, 102, 0, 0]]
    mask = [[1, 1, 1, 1, 0, 0]]

    model.eval()
    
    outputs = model(torch.tensor(input_ids), attention_mask=torch.tensor(mask) )
    
    last_hidden_state = outputs.last_hidden_state
    
    summed_final_hidden_state = torch.sum(last_hidden_state, 1)
    logits = logit_layer(summed_final_hidden_state)
    
    probs = torch.sigmoid(logits)
            
    prob_list.append(probs)
    print(outputs[-1])
    #print('probs', probs)
        
    return prob_list

probs = validation_nn()

Here is a snippet of the output

tensor([[[[0.2019, 0.1652, 0.0762, 0.5566, 0.0000, 0.0000],
[0.2206, 0.1417, 0.3824, 0.2552, 0.0000, 0.0000],
[0.0319, 0.1476, 0.6395, 0.1810, 0.0000, 0.0000],
[0.4228, 0.1063, 0.1553, 0.3157, 0.0000, 0.0000],
[0.1468, 0.1533, 0.4655, 0.2344, 0.0000, 0.0000],
[0.1618, 0.1286, 0.4877, 0.2219, 0.0000, 0.0000]],

     [[0.9799, 0.0092, 0.0029, 0.0080, 0.0000, 0.0000],
      [0.0043, 0.0714, 0.9032, 0.0212, 0.0000, 0.0000],
      [0.2281, 0.1369, 0.4741, 0.1608, 0.0000, 0.0000],
      [0.3400, 0.3117, 0.0927, 0.2557, 0.0000, 0.0000],
      [0.0053, 0.1026, 0.7928, 0.0992, 0.0000, 0.0000],
      [0.0063, 0.0944, 0.8022, 0.0971, 0.0000, 0.0000]],

As you can see, the two columns to the far right, which should be masked, are zero’d out, as they should be. But I get two more rows which are not zero’d out. When I go to sum my results, these mask vectors are added, which obviously messes up my results. I imagine having these non zero’d vectors messes up the training as well. Is there a way to zero out the rows, which belong to mask tokens?

hey @bennicholl you might have better luck using the Trainer class to run your training with a custom loss function (see docs):

import torch
from transformers import Trainer

class MultilabelTrainer(Trainer):
    def compute_loss(self, model, inputs, return_outputs=False):
        labels = inputs.pop("labels")
        outputs = model(**inputs)
        logits = outputs.logits
        loss_fct = torch.nn.BCEWithLogitsLoss()
        loss = loss_fct(logits.view(-1, self.model.config.num_labels),
                        labels.float().view(-1, self.model.config.num_labels))
        return (loss, outputs) if return_outputs else loss

then you can define a compute_metrics function as follows

from scipy.special import expit as sigmoid
from sklearn.metrics import classification_report

def compute_metrics(pred):
    y_true = pred.label_ids
    y_pred = sigmoid(pred.predictions) y_pred = (y_pred>0.5).astype(float)
        
    clf_dict = classification_report(y_true, y_pred, target_names=all_labels,
                                         zero_division=0, output_dict=True)
    return {"micro f1": clf_dict['micro avg']['f1-score'], "macro f1": clf_dict['macro avg']['f1-score']}

and after fine-tuning you can then get the predictions on your validation set via Trainer.predict:

trainer = # fine-tuned Trainer
pred = trainer.predict(your_eval_dataset)
metrics = compute_metrics(pred)

hth!

1 Like