Fine-tuning Bio-Clinical Bert model

Hi,

I have been trying to fine tune the emilyalsentzer/Bio_ClinicalBERT model for a multi-label text classification problem.

I have a hard time getting the generalized version of the model after fine-tuning. It overfits with what it has learned with the training dataset.

I am listing out the hyper-parameters I have been using:

batch_size = 16 (cos anything more my gpu ram will explode)
learning_rate=2e-5
epochs=7
weight_decay=0.01
lr_scheduler_type=‘linear’

For eg: If my input sentence is like “You need to visit the Ortho-specialist” → I would expect to get the label ‘Orthopedist’ but it doesn’t. Only if the sentence is like this (as learnt through the training dataset) - “You need to visit the Orthopedist” → The label is predicted as ‘Orthopedist’.

Similarly, If my input sentence is like “Please go and get the Magnetic Imaging Resonance done.” → I would expect to get the label ‘MRI’ but it doesn’t. Only if the sentence is like this (as learnt through the training dataset) - “Please go and get the MRI done.” → The label is predicted as ‘MRI’.

I have tried a smaller learning rate for fine tuning. But it is underfitting.