Hello, how can I get the hidden_states from a BertForPretraining model with input embeddings (not input_ids) as inputs? In my understanding, I have to calcualte the input embeddings with the embeddings part of the model and pass the input embeddings to the encoder part of the model. Without an attention_mask, the hidden_states are equal to the hidden_states when using the input_ids as inputs but when I define an attention_mask the hidden_states are different. What am I missing? Already a big thanks in advance.
Here is a simple code for reproduction:
model = BertForPreTraining.from_pretrained('bert-base-uncased')
input_ids = torch.cat([torch.tensor([102, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 103]).unsqueeze(0), torch.zeros(1, 500)], axis=1).type(torch.LongTensor)
attention_mask = torch.cat([torch.ones(1, 12), torch.zeros(1, 500)], axis=1).type(torch.LongTensor)
model.eval()
model.zero_grad()
# Without attention_mask
hs1 = model(input_ids, attention_mask=None, output_hidden_states=True)['hidden_states']
input_embeddings = model.bert.embeddings(input_ids)
hs2 = model.bert.encoder(input_embeddings, attention_mask=None, output_hidden_states=True)['hidden_states']
# hs1 == hs2
# With attention_mask
hs1_am = model(input_ids, attention_mask=attention_mask, output_hidden_states=True)['hidden_states']
input_embeddings = model.bert.embeddings(input_ids)
hs2_am = model.bert.encoder(input_embeddings, attention_mask=attention_mask, output_hidden_states=True)['hidden_states']
# hs1_am != hs2_am