The output shape of the last_hidden_state
is [8, 197, 768]. As far as I know, the first is the batch size, the second is the sequence length and the last is the hidden size. But I am unclear about the sequence length and the hidden size. The code is given below:
with torch.no_grad():
outputs = model(batch['pixel_values'])
print(outputs.last_hidden_state.shape)
torch.Size([8, 197, 768])
Can anyone explain what actually sequence length and hidden size mean? Thanks in advance.