I am trying to use Transformers for text classification. If I am not classifying to one of the pre-made GLUE benchmarks (and using my own use-case classes & texts), do I have to “fine-tune” the model? If I have 35k texts, and 2 labels (imbalanced – 98% vs 2%) can I just use the AutoModelforSequenceClassification to throw a softmax on the end of the transformer, and train that softmax? Or do I fine-tune the whole thing, using this tutorial?
Thanks! I am excited about better understanding the field and more effectively using the library.
I’m not familiar with imbalanced datasets, but if I were you, I would try using examples/text-classification/run_glue.py.
In this example, instead of assigning GLUE benchmarks task names, we can use our own train_file, validation_file, and test_file.
The short answer to your question is that you generally do have to fine-tune one of the pretrained language models like distilbert-base-uncased using AutoModelForSequenceClassification.
To tackle the imbalance, you could try upsampling (downsampling) the minority (majority) class or failing that weight the classes directly in the loss function of the Trainer: Trainer — transformers 4.2.0 documentation
I tried the method you mentioned in Jay Alammar’s post – it indeed worked, but had weak performance (was beaten by my “benchmark” of tfidf/logistic regression) - so I will indeed attempt to use the fine-tune with Trainer. I may try to downsample the majority class, at least at first, to make it run faster.