I am working on a project for text classification. I started out with BERT and AutoModelForSequenceClassification and now i want to move up the food chain and try some larger models. to do that on a single A100 i was hoping to use PEFT, LORA and bitsandbytes or accelerate for example. Bit the examples i have found all use AutoModelForCausalLM.
I tried to adapt one of the tutorials to use BloomForSequenceClassification but the
PEFT tutorial suggested " Finally, we need to apply some post-processing on the 8-bit model to enable training, let’s freeze all our layers, and cast the layer-norm in
float32 for stability. We also cast the output of the last layer in
float32 for the same reasons." and had code
model.lm_head = CastOutputToFloat(model.lm_head)
Which is fine for AutoModelForCausalLM but does not work for BloomForSequenceClassification because it does not have an lm_head layer.
Which leads me to ask, can one use PEFT and LORA with a AutoModelForSequenceClassification?
Alternatively, can one use a AutoModelForCausalLM for text classification?