Hi,
I am trying to compute prediction_logits
using BertForPreTraining
model. For some reason, I don’t want to use outputs.prediction_logits
and I want to be able to generate them by multiplying the last hidden state with decoder weights. The problem is that when I do this the results I get are not equal to outputs.prediction_logits
. Here is the code:
model = BertForPreTraining.from_pretrained("bert-base-multilingual-cased", output_hidden_states=True).to(device)
w = model.state_dict()['cls.predictions.decoder.weight'].cpu().numpy()
b = model.state_dict()['cls.predictions.decoder.bias'].cpu().numpy()
with torch.no_grad():
outputs = model(**inputs)
output_logits = outputs.prediction_logits.cpu().numpy()
last_hidden_states = outputs.hidden_states[-1].cpu().numpy()
preds = output_logits[i, token_idx]
h = last_hidden_states[i, token_idx]
h_transformed = np.dot(w, h) + b
Basically, I expect h_transformed
to be equal to preds
, but it is not.
Thanks for your help