AWD-LSTM beats finetuned BERT as train ds decreases?! 🤷🏽

khanzar · April 30, 2024, 7:38pm

I’m building a radiology report classifier. In this particular example, input: radiology report text, target: If patient has micro-calcifications (pos, neg, n/a)
There are two models I’m using; 1) AWD-LSTM 2) BERT model that was pre-trained on radiology reports
The dataset is stratified by target labels and split 50/50 where n=450/450.
When I train on the full split with train_ds = 450, I get a 90+ accuracy with the pre-trained BERT model vs ~80 accuracy for the AWD-LSTM.
However as I start to scale back the training dataset size df_train = df_train.sample(frac=frac, random_state=42), I see worse performance in the BERT model vs the AWD-LSTM which is not what I expected.

microcalcs-acc-resuls2528×1328 236 KB

Have you seen similar results in your training?

khanzar · April 30, 2024, 8:28pm

Found the culprit!
It happened to be a fluke which derived from how I was creating the progressively smaller subsets. I made the mistake of doing a 50/50 split, then randomly sample from the 50 split and taking a percentage for the new subset. This approach makes the stratified approach moot.

khanzar · May 9, 2024, 2:41pm

One thing that still confuses me is the following:

Why is an AWD-LSTM based text classifier getting comparable results to a domain-specific BERT model?
- I skipped the LM fine-tuning for the AWD-LSTM
- The BERT model was fine-tuned on Radiology reports
- My classification dataset size is fairly small

Topic		Replies	Views
Why BertForSequenceClassification performs worse than BertModel+Linear? Models	0	335	February 24, 2022
Training Loss Higher than Validation Loss Beginners	0	432	August 3, 2022
Using EXTREMELY small dataset to finetune BERT 🤗Transformers	6	13129	February 1, 2023
Domain Specific Pretraining using BERT models vs other smaller architecture models 🤗Transformers	0	210	December 7, 2023
Questions about my first code on fine-tuning BERT model for text-classification Beginners	0	1508	April 26, 2022

AWD-LSTM beats finetuned BERT as train ds decreases?! 🤷🏽

Related topics