@sgugger I am feeding a one entry on inpud_ids to the model, that’s only the unseen part of the code.
Is the second part of the tuple ( one layer for the output of each layer) of shape (batch_size, sequence_length, hidden_size) in an asscending or descending order. and will it include the last output layer as well? If that’s the case, I can access it from the second part of the tuple:
output layer = output[2][1][-1] or output[2][1][0]
- 2 to access the hidden states
- 1 to access the second part of the tuple (hidden states layers)
- 0 or -1 to access the last layer which is the output