The doc string says:
LlamaForSequenceClassification uses the last token in order to do the classification, as other causal models (e.g. GPT-2) do.
Should this “last token” be an EOS or simply the final token in the input without an EOS? My interpretation is that it should not be an EOS, because otherwise, it would probably say that explicitly. Plus many people use the EOS as the pad token, in which case it would be identical to not using the EOS as the “last token” for sequence classification.
However, I’m not certain so I’d appreciate it if anyone knew!