Hi! I am trying to recover the initial input representation that BERT generates for each token in a given sequence. I know this initial representation is made by adding the token embedding with the positional and the segment embedding (see picture below from the paper). I also know these individual embeddings (token, positional and segment) can be accessed through the model. However, is there anyway I can extract the complete initial input representation from the output of the model?
I am currently just doing this and have not been able to find this representation anywhere in the output.
# load the model model = BertModel.from_pretrained('bert-base-uncased', output_hidden_states = True, output_attentions = True ) # define the sentence and tokenize it sent = "my dog is cute" encoded_sent = tokenizer(sent, return_tensors="pt") # pass the sentence through the model output = model(**encoded_sent)