Get output embeddings out of a transformer model

abdallah197 · September 22, 2020, 11:23am

Assuming that I am using a language model like BertForMaskedLM. how can I get the embeddings for each word in the sequence after passing the input_ids to the model. In the docs, I found the function get_out_embeddings but it returns an nn.Module linear(seq_length, vocab_size)

My full model is something similar to

class BYOLLM(AlbertPreTrainedModel):
    def __init__(self, config):
        super().__init__(config)

        self.albert = AlbertModel(config)
        self.predictions = AlbertMLMHead(config)
        self.init_weights()
        self.tie_weights()
        self.config = config

        self.mlp = nn.Sequential(
            nn.Linear(config.hidden_size, 4096),
            nn.BatchNorm1d(4096),
            nn.ReLU(inplace=True),
            nn.Linear(4096, config.hidden_size),
        )

    def tie_weights(self):
        self._tie_or_clone_weights(
            self.predictions.decoder, self.albert.embeddings.word_embeddings
        )

    def get_output_embeddings(self):
        return self.predictions.decoder

    def forward(
        self,
        input_ids=None,
        attention_mask=None,
        token_type_ids=None,
        position_ids=None,
        head_mask=None,
        inputs_embeds=None,
        labels=None,
        output_attentions=None,
        output_hidden_states=None,
        masked_index = None,
        **kwargs
    ):

        outputs = self.albert(
            input_ids=input_ids,
            attention_mask=attention_mask,
            token_type_ids=token_type_ids,
            position_ids=position_ids,
            head_mask=head_mask,
            inputs_embeds=inputs_embeds,
            output_attentions=output_attentions,
            output_hidden_states=output_hidden_states,
        )
        sequence_outputs = outputs[0]

        prediction_scores = self.predictions(sequence_outputs)

        return prediction_scores

What I want is to access the predictions scores logits and get the corresponding embedding for a specific word. getting the embedding is the part that I am asking about.

abdallah197 · September 22, 2020, 11:56am

I have found this in the docs

hidden_states (tuple(torch.FloatTensor), optional, returned when output_hidden_states=True is passed or when config.output_hidden_states=True) – Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size).

But when using this option, i thought i should get Tuple of torch.FloatTensor (one for the output of the embeddings + one for the output of each layer) of shape (batch_size, sequence_length, hidden_size). but after passing this to the model, the one for the output embedding is in shape (1, hidden_size) instead of (1, seq_lenght, hidden_size)

note: the Batch size is 1

sgugger · September 22, 2020, 1:03pm

It’s a bit hard to understand what my be wrong without seeing your code. The shapes of the hidden states and embeddings are tested in the common tests, so they should be right.

abdallah197 · September 22, 2020, 2:12pm

@sgugger I am feeding a one entry on inpud_ids to the model, that’s only the unseen part of the code.
Is the second part of the tuple ( one layer for the output of each layer) of shape (batch_size, sequence_length, hidden_size) in an asscending or descending order. and will it include the last output layer as well? If that’s the case, I can access it from the second part of the tuple:
output layer = output[2][1][-1] or output[2][1][0]

2 to access the hidden states
1 to access the second part of the tuple (hidden states layers)
0 or -1 to access the last layer which is the output

qpazuzu · July 20, 2021, 4:42am

Hi @abdallah197 I have the exact same question and not able to figure it out. I hope you found an answer, could you please share your findings?

Thank you!

Topic		Replies	Views
In BertForMaskedLM, how to return as output the predicted embedding? Beginners	0	426	February 4, 2021
Embeddings in yieldBERT Beginners	0	79	January 25, 2024
How to get a model's initial input representation? 🤗Transformers	2	500	June 21, 2022
Where to pick-up embedding data from BERT model? Models	2	659	February 8, 2022
Should I use BertModel or BertModelForLM? Beginners	2	305	February 10, 2022

Get output embeddings out of a transformer model

Related Topics