Using Seq2SeqTrainer to eval during training?

following the instruction of run_summarization.py in the example/summarization/ folder.

To further eval the trained model during training, i set the eval_strategy = "steps" and the bash file is:

CUDA_VISIBLE_DEVICES=4,5,6,7 python -m torch.distributed.launch \
    --nproc_per_node 4 \
    --use_env \
    $(pwd)/run_summarization.py \
    --model_name_or_path /path/to/bart-base \
    --dataset_name cnn_dailymail \
    --dataset_config_name 3.0.0 \
    --per_device_train_batch_size 16 \
    --per_device_eval_batch_size 16 \
    --num_train_epochs 5 \
    --do_train \
    --do_eval \
    --predict_with_generate \
    --learning_rate 3e-5 \
    --label_smoothing_factor 0.1 \
    --weight_decay 0.01 \
    --max_grad_norm 1.0 \
    --logging_strategy 'steps' \
    --logging_steps 1000 \
    --save_strategy 'steps' \
    --save_steps 5000 \
    --save_total_limit 3 \
    --evaluation_strategy 'steps' \
    --eval_steps 5000 \
    --fp16 \
    --output_dir /path/to/output_dir \

But when eval during training, the eval results seem not will. The max_length & num_beams are set to default values in the Config files of BART(max_length=20, num_beams=4).

Functions relative to the problem might be:

train -- trainer.py
 _maybe_log_save_evaluate -- trainer.py
evaluate --trainer_seq2seq.py
evaluate -- trainer.py
evaluation_loop -- train.py
prediction_step --trainer_seq2seq.py

maybe in the trainer_seq2seq.py, TheSeq2SeqTrainer should ovrride the _maybe_log_save_evaluate function to explictly provided the max_length & num_beams hyper-parameters ?


Another problems is in the run_summarization_no_trainer.py.
in the Link 293, It didn’t provide the fp16, so in the Link 444, the accelerator.use_fp16 must be False. I guess the fp16 should be added in the parser as a hyper-parameter ?

1 Like

The same question for trainer also