I want to feed the last layer hidden state which is generated by RoberTa.
out = pretrained_roberta(dummy_input[“input_ids”],
dummy_input[“attention_mask”], output_hidden_states=True)
out = out.hidden_states[0]
out = nn.Dense(features=3)(out)
Is that equivalent to pooler_output in Bert?
pooler_output (torch.FloatTensor of shape (batch_size, hidden_size)) — Last layer hidden-state of the first token of the sequence (classification token) after further processing through the layers used for the auxiliary pretraining task.