Hyperparameters LiLT with custom RoBERTa training

aghenghiu · March 25, 2024, 2:19pm

After creating my version of LiLT with custom RoBERTa, I’m trying to perform training process over my own dataset for token classification problem.

I’m not an expert in hyperparameters. For this scenario, I’ve been using the following hyperparameters:

max_steps=20000,
per_device_train_batch_size=2,
per_device_eval_batch_size=2,
learning_rate=1e-5,
evaluation_strategy="steps",
eval_steps=100

This is how my training results are looking like (unfortunately, google colab only saved till step 17500):

Training Loss and Validation Loss for Step
Step, Precision, Recall, F1 and Accuracy for Step

Step	Training Loss	Validation Loss	Precision	Recall	F1	Accuracy
17000	0,003	0,020361	0,956147	0,955905	0,956026	0,996853
17100	0,003	0,020292	0,954889	0,960213	0,957544	0,996866
17200	0,003	0,020027	0,957533	0,959959	0,958745	0,99696
17300	0,003	0,020356	0,95736	0,955905	0,956632	0,996849
17400	0,003	0,020275	0,959168	0,958439	0,958803	0,996934
17500	0,0021	0,019903	0,962529	0,956918	0,959715	0,997002

Could I improve those results by tuning the hyperparameters somehow? Perhaps increasing max_steps?

nielsr · March 25, 2024, 4:21pm

Nice!

The Trainer class supports hyperparameter tuning: Hyperparameter Search using Trainer API. This allows you to run various training runs across various settings of hyperparameters.

aghenghiu · March 26, 2024, 8:25am

What a nice feature! I’ll definitely dig into that.

Just another quick question related to my training loss and validation loss chart. Doesn’t it show a tiny underfitting? The little I know about training results is, at times, when the validation loss is greater than the training loss may indicate that the model is underfitting.

But I don’t really know when the difference between validation loss and training loss starts being significant.

Topic		Replies	Views
RoBERTa fine-tuning on a dataset of short sentences and low cardinality 🤗Transformers	0	731	December 4, 2023
Fine-tuning XLM-RoBERTa for binary sentiment classification Beginners	1	1433	November 4, 2021
Increasing validation loss even with small learning rate - RoBERTa Models	0	1125	March 1, 2021
Challenges Achieving Satisfactory Accuracy in Fine-Tuning RoBERTa on a Custom Masked Token Prediction Dataset 🤗Transformers	2	298	March 4, 2024
Does anyone else observer RoBERTa fine-tuning instability? 🤗Transformers	8	3115	April 20, 2023

Hyperparameters LiLT with custom RoBERTa training

Related topics