I'm making ROBERTA dumber, and I don't know why

aberquand · January 19, 2021, 12:06pm

Hi there,

I’m further training from roberta-base using my domain-specific corpus (parsed text related to space systems) and the run-mlm.py script.

Here is my code:

output=os.system("python run_mlm.py "
"–model_name_or_path=roberta-base "
"–overwrite_output_dir "
"–train_file=‘data/training.txt’ "
"–validation_file=‘data/testing.txt’ "
"–per_device_train_batch_size=8 "
"–per_device_eval_batch_size=8 "
"–do_train "
"–do_eval "
"–line_by_line "
"–save_steps=53769 "
"–num_train_epochs=40 "
"–output_dir=‘./spaceROBERTA/’ "
“–logging_steps=4481”)

The training loss is decreasing (from around 2 to 1), the perplexity over the evaluation set is a bit high but also decreasing (starts at 10 and finishes around 7). So I thought all lights were green for the training, yeah!

But when I fine-tune it over our labeled dataset for a Concept Recognition task, the performance is slightly worse than roberta-base, and getting significantly worse and worse as the number of training epochs increases

I’m basically making roberta-base dumber and dumber and I don’t know why…
I appreciate if anyone can point to a solution, thanks

aberquand · March 8, 2021, 10:32am

Update: Increasing the batch size to 256, thanks to gradient-accumulation, improved the performance

"--per_device_train_batch_size=16 "
"--per_device_eval_batch_size=16 "
"--gradient_accumulation_steps=16 "

Topic		Replies	Views
Fine-tuned MLM based RoBERTa not improving performance Research	2	949	April 20, 2023
Does anyone else observer RoBERTa fine-tuning instability? 🤗Transformers	8	3120	April 20, 2023
Reproduce BERT and RoBERTa 🤗Transformers	1	975	July 24, 2023
BERT pre-training run_mlm_flax.py questions Beginners	0	254	November 3, 2021
Opinion: Training Argument Fine Tuning MLM RoBERTa Intermediate	1	218	January 9, 2025

I'm making ROBERTA dumber, and I don't know why

Related topics