I want to get the pooled output from the LXMertForPretraining class.
Since the keys provided in the LXMertForPretrainingOuput are these:
odict_keys(['prediction_logits', 'cross_relationship_score', 'question_answering_score', 'language_hidden_states', 'vision_hidden_states', 'language_attentions', 'vision_attentions', 'cross_encoder_attentions'])
None of those keys has the pooled output, so I am getting the pooled output this way:
visual_output = output['vision_hidden_states'][-1] lang_output = output['language_hidden_states'][-1] pooled_output = model_lxmert.lxmert.lxmert.pooler(lang_output)
This is consistent with what happens in the LXMertModel in here
https://github.com/huggingface/transformers/blob/main/src/transformers/models/lxmert/modeling_lxmert.py#L1004, which is basically this:
hidden_states = (language_hidden_states, vision_hidden_states) if output_hidden_states else () visual_output = vision_hidden_states[-1] lang_output = language_hidden_states[-1] pooled_output = self.pooler(lang_output)
But since, I am using the LXMertForPretraining class instead of the LXMert base class, I need to perform two forward passes. One to get the output of the LXMertForPretraining model and then I need to make this forward pass to get the pooled output.
Is this the correct way to get the pooled output from LXMetForPretraining or is there a better way to do it?