following the instruction of run_summarization.py
in the example/summarization/
folder.
To further eval the trained model during training, i set the eval_strategy = "steps"
and the bash file is:
CUDA_VISIBLE_DEVICES=4,5,6,7 python -m torch.distributed.launch \
--nproc_per_node 4 \
--use_env \
$(pwd)/run_summarization.py \
--model_name_or_path /path/to/bart-base \
--dataset_name cnn_dailymail \
--dataset_config_name 3.0.0 \
--per_device_train_batch_size 16 \
--per_device_eval_batch_size 16 \
--num_train_epochs 5 \
--do_train \
--do_eval \
--predict_with_generate \
--learning_rate 3e-5 \
--label_smoothing_factor 0.1 \
--weight_decay 0.01 \
--max_grad_norm 1.0 \
--logging_strategy 'steps' \
--logging_steps 1000 \
--save_strategy 'steps' \
--save_steps 5000 \
--save_total_limit 3 \
--evaluation_strategy 'steps' \
--eval_steps 5000 \
--fp16 \
--output_dir /path/to/output_dir \
But when eval during training, the eval results seem not will. The max_length
& num_beams
are set to default values in the Config files of BART(max_length=20, num_beams=4).
Functions relative to the problem might be:
train -- trainer.py
_maybe_log_save_evaluate -- trainer.py
evaluate --trainer_seq2seq.py
evaluate -- trainer.py
evaluation_loop -- train.py
prediction_step --trainer_seq2seq.py
maybe in the trainer_seq2seq.py, TheSeq2SeqTrainer
should ovrride the _maybe_log_save_evaluate
function to explictly provided the max_length
& num_beams
hyper-parameters ?
Another problems is in the run_summarization_no_trainer.py
.
in the Link 293, It didn’t provide the fp16
, so in the Link 444, the accelerator.use_fp16
must be False. I guess the fp16
should be added in the parser as a hyper-parameter ?