I'm making ROBERTA dumber, and I don't know why

Hi there,

I’m further training from roberta-base using my domain-specific corpus (parsed text related to space systems) and the run-mlm.py script.

Here is my code:

output=os.system("python run_mlm.py "
"–model_name_or_path=roberta-base "
"–overwrite_output_dir "
"–train_file=‘data/training.txt’ "
"–validation_file=‘data/testing.txt’ "
"–per_device_train_batch_size=8 "
"–per_device_eval_batch_size=8 "
"–do_train "
"–do_eval "
"–line_by_line "
"–save_steps=53769 "
"–num_train_epochs=40 "
"–output_dir=’./spaceROBERTA/’ "

The training loss is decreasing (from around 2 to 1), the perplexity over the evaluation set is a bit high but also decreasing (starts at 10 and finishes around 7). So I thought all lights were green for the training, yeah!

But when I fine-tune it over our labeled dataset for a Concept Recognition task, the performance is slightly worse than roberta-base, and getting significantly worse and worse as the number of training epochs increases :scream:

I’m basically making roberta-base dumber and dumber and I don’t know why…
I appreciate if anyone can point to a solution, thanks :hugs:

Update: Increasing the batch size to 256, thanks to gradient-accumulation, improved the performance :slight_smile:

"--per_device_train_batch_size=16 "
"--per_device_eval_batch_size=16 "
"--gradient_accumulation_steps=16 "