I trained a vanilla Bert Model.
model = BertModel.from_pretrained(‘bert-base-uncased’, num_labels = 4,
output_hidden_states=True,
output_attentions=True)
That I trained. I’m using multi label classification, so I train this model with an extra logit layer attached to it, then summed the results. Code for running it is below
def validation_nn():
prob_list = []
input_ids = [[101,4769, 77, 102, 0, 0]]
mask = [[1, 1, 1, 1, 0, 0]]
model.eval()
outputs = model(torch.tensor(input_ids), attention_mask=torch.tensor(mask) )
last_hidden_state = outputs.last_hidden_state
summed_final_hidden_state = torch.sum(last_hidden_state, 1)
logits = logit_layer(summed_final_hidden_state)
probs = torch.sigmoid(logits)
prob_list.append(probs)
print(outputs[-1])
#print('probs', probs)
return prob_list
probs = validation_nn()
Here is a snippet of the output
tensor([[[[0.2019, 0.1652, 0.0762, 0.5566, 0.0000, 0.0000],
[0.2206, 0.1417, 0.3824, 0.2552, 0.0000, 0.0000],
[0.0319, 0.1476, 0.6395, 0.1810, 0.0000, 0.0000],
[0.4228, 0.1063, 0.1553, 0.3157, 0.0000, 0.0000],
[0.1468, 0.1533, 0.4655, 0.2344, 0.0000, 0.0000],
[0.1618, 0.1286, 0.4877, 0.2219, 0.0000, 0.0000]],[[0.9799, 0.0092, 0.0029, 0.0080, 0.0000, 0.0000], [0.0043, 0.0714, 0.9032, 0.0212, 0.0000, 0.0000], [0.2281, 0.1369, 0.4741, 0.1608, 0.0000, 0.0000], [0.3400, 0.3117, 0.0927, 0.2557, 0.0000, 0.0000], [0.0053, 0.1026, 0.7928, 0.0992, 0.0000, 0.0000], [0.0063, 0.0944, 0.8022, 0.0971, 0.0000, 0.0000]],
As you can see, the two columns to the far right, which should be masked, are zero’d out, as they should be. But I get two more rows which are not zero’d out. When I go to sum my results, these mask vectors are added, which obviously messes up my results. I imagine having these non zero’d vectors messes up the training as well. Is there a way to zero out the rows, which belong to mask tokens?