Hi, i am currently further pretraining my own BERT model for the cooking domain. I chose bert-base-uncased model as a starting point and use the run_mlm.py script to further pretrain the model and adapt it to the cooking data. The data consists of approx. 2 million recipe instructions from recipeNLG dataset, with 5% used as validation data.
I use the following arguments for training:
!python run_mlm.py \
--model_name_or_path=bert-base-uncased \
--output_dir=CookBERT/further_pretraining/model_output \
--do_train \
--do_eval \
--validation_split_percentage=5 \
--train_file=datasets/recipeNLG/recipeNLG_instructions.txt \
--per_device_train_batch_size=16 \
--per_device_eval_batch_size=16 \
--gradient_accumulation_steps=2 \
--learning_rate=2e-5 \
--num_train_epochs=3 \
--save_total_limit=10 \
--save_strategy=steps \
--save_steps=1000 \
--line_by_line \
--max_seq_length=256 \
--evaluation_strategy=steps \
--eval_steps=1000 \
Training process works fine but i am just curious if theres a good explanation on why the validation loss is lower than the train loss?
Any ideas are welcome