Get output embeddings out of a transformer model

Assuming that I am using a language model like BertForMaskedLM. how can I get the embeddings for each word in the sequence after passing the input_ids to the model. In the docs, I found the function get_out_embeddings but it returns an nn.Module linear(seq_length, vocab_size)

My full model is something similar to

class BYOLLM(AlbertPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)

        self.albert = AlbertModel(config)
        self.predictions = AlbertMLMHead(config)
        self.init_weights()
        self.tie_weights()
        self.config = config

        self.mlp = nn.Sequential(
            nn.Linear(config.hidden_size, 4096),
            nn.BatchNorm1d(4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, config.hidden_size),
        )

    def tie_weights(self):
        self._tie_or_clone_weights(
            self.predictions.decoder, self.albert.embeddings.word_embeddings
        )

    def get_output_embeddings(self):
        return self.predictions.decoder

    def forward(
        self,
        input_ids=None,
        attention_mask=None,
        token_type_ids=None,
        position_ids=None,
        head_mask=None,
        inputs_embeds=None,
        labels=None,
        output_attentions=None,
        output_hidden_states=None,
        masked_index = None,
        **kwargs
    ):

        outputs = self.albert(
            input_ids=input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
        )
        sequence_outputs = outputs[0]

        prediction_scores = self.predictions(sequence_outputs)

        return prediction_scores

What I want is to access the predictions scores logits and get the corresponding embedding for a specific word. getting the embedding is the part that I am asking about.

I have found this in the docs

hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

But when using this option, i thought i should get Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size). but after passing this to the model, the one for the output embedding is in shape (1, hidden_size) instead of (1, seq_lenght, hidden_size)

note: the Batch size is 1

It’s a bit hard to understand what my be wrong without seeing your code. The shapes of the hidden states and embeddings are tested in the common tests, so they should be right.

@sgugger I am feeding a one entry on inpud_ids to the model, that’s only the unseen part of the code.
Is the second part of the tuple ( one layer for the output of each layer) of shape (batch_size, sequence_length, hidden_size) in an asscending or descending order. and will it include the last output layer as well? If that’s the case, I can access it from the second part of the tuple:
output layer = output[2][1][-1] or output[2][1][0]

  • 2 to access the hidden states
  • 1 to access the second part of the tuple (hidden states layers)
  • 0 or -1 to access the last layer which is the output

Hi @abdallah197 I have the exact same question and not able to figure it out. I hope you found an answer, could you please share your findings?

Thank you!