I was wondering whether it is sensible to use BLOOM for sentence classification tasks (ie using the last_hidden_states
in combination with a classification layer).
It seems to work for my test case. However, I’m curious whether this is only by chance (and I’m interpreting my results wrong) because BLOOM is specified for text generation and not for sentence classification in huggingface.
Usually, one uses models with bidirectional attention (like BERT, RoBERTa) for text classification tasks (to benefit from context on both sides). Models like GPT-2 or BLOOM use a causal attention mask, making sure only context from the left is taken into account.
They are more suited for text generation, rather than text classification/NER etc. (however you can just ask the model to generate a class, or named entities, for instance).
1 Like