How to extract encoding before classification layer?


So I am using a RobertaForSequenceClassification model for natural language inference (ynie/roberta-large-snli_mnli_fever_anli_R1_R2_R3-nli · Hugging Face). How may I obtain the encoding of my sequence before it passes through the final classification layer?

Here are what I have tried:
Firstly I identified all the parameters
[ …

Then I use the hidden state flag to obtain the hidden state vectors with a single input (batch_size =1).
outputs =model(input_ids,attention_mask=attention_mask,token_type_ids=token_type_ids,output_hidden_states=True, labels=None)
outputs[1] is the hidden states which has size [1,23,1024], 1024 is my hidden state dim

t = []
matrix = None
bias = None
dense_m = None
dense_b = None

for name, param in model.named_parameters():
    if name == 'classifier.dense.weight':
        dense_m = param
    if name == 'classifier.dense.weight':
        dense_b = param
    if name == 'classifier.out_proj.weight':
        matrix = param
    if name == 'classifier.out_proj.bias':
        bias = param

test = outputs[1][-1][-1][-1]
t =, torch.unsqueeze(test, dim=1)).T + dense_b
print(, t).T + bias)


The final output has shape [1,3], but mine is [23,3] and the last row is not equal to the original output. What have I done wrong or what should I do in this case?

Thank you very much!