in my case, I’m working on speechT5ASR and the logits are tuple of 2 items, the first is the decoder output (logits that i need) and second is the encoder last hidden state. so l work with logits[0]
2 Likes