- I’m building a radiology report classifier. In this particular example, input: radiology report text, target: If patient has micro-calcifications (pos, neg, n/a)
- There are two models I’m using; 1) AWD-LSTM 2) BERT model that was pre-trained on radiology reports
- The dataset is stratified by target labels and split 50/50 where n=450/450.
- When I train on the full split with train_ds = 450, I get a 90+ accuracy with the pre-trained BERT model vs ~80 accuracy for the AWD-LSTM.
- However as I start to scale back the training dataset size
df_train = df_train.sample(frac=frac, random_state=42)
, I see worse performance in the BERT model vs the AWD-LSTM which is not what I expected.
Have you seen similar results in your training?