Hi,
I have been trying to fine tune the emilyalsentzer/Bio_ClinicalBERT model for a multi-label text classification problem.
I have a hard time getting the generalized version of the model after fine-tuning. It overfits with what it has learned with the training dataset.
I am listing out the hyper-parameters I have been using:
batch_size = 16 (cos anything more my gpu ram will explode)
learning_rate=2e-5
epochs=7
weight_decay=0.01
lr_scheduler_type=‘linear’
For eg: If my input sentence is like “You need to visit the Ortho-specialist” → I would expect to get the label ‘Orthopedist’ but it doesn’t. Only if the sentence is like this (as learnt through the training dataset) - “You need to visit the Orthopedist” → The label is predicted as ‘Orthopedist’.
Similarly, If my input sentence is like “Please go and get the Magnetic Imaging Resonance done.” → I would expect to get the label ‘MRI’ but it doesn’t. Only if the sentence is like this (as learnt through the training dataset) - “Please go and get the MRI done.” → The label is predicted as ‘MRI’.
I have tried a smaller learning rate for fine tuning. But it is underfitting.