Hyperparameters LiLT with custom RoBERTa training

After creating my version of LiLT with custom RoBERTa, I’m trying to perform training process over my own dataset for token classification problem.

I’m not an expert in hyperparameters. For this scenario, I’ve been using the following hyperparameters:

max_steps=20000,
per_device_train_batch_size=2,
per_device_eval_batch_size=2,
learning_rate=1e-5,
evaluation_strategy="steps",
eval_steps=100

This is how my training results are looking like (unfortunately, google colab only saved till step 17500):

Training Loss and Validation Loss for Step
Step, Precision, Recall, F1 and Accuracy for Step

Step Training Loss Validation Loss Precision Recall F1 Accuracy
17000 0,003 0,020361 0,956147 0,955905 0,956026 0,996853
17100 0,003 0,020292 0,954889 0,960213 0,957544 0,996866
17200 0,003 0,020027 0,957533 0,959959 0,958745 0,99696
17300 0,003 0,020356 0,95736 0,955905 0,956632 0,996849
17400 0,003 0,020275 0,959168 0,958439 0,958803 0,996934
17500 0,0021 0,019903 0,962529 0,956918 0,959715 0,997002

Could I improve those results by tuning the hyperparameters somehow? Perhaps increasing max_steps?

Nice!

The Trainer class supports hyperparameter tuning: Hyperparameter Search using Trainer API. This allows you to run various training runs across various settings of hyperparameters.

1 Like

What a nice feature! I’ll definitely dig into that.

Just another quick question related to my training loss and validation loss chart. Doesn’t it show a tiny underfitting? The little I know about training results is, at times, when the validation loss is greater than the training loss may indicate that the model is underfitting.

But I don’t really know when the difference between validation loss and training loss starts being significant.