Huge difference in speed when finetuning summarization with different scripts

Of course. Thank you for looking into it.

For the tfmr3 finetune.py:

python finetune.py \
    --learning_rate=1e-4 \
    --do_train \
    --do_predict \
    --n_val 1000 \
    --num_train_epochs 1 \
    --val_check_interval 0.25 \
    --max_source_length 512 --max_target_length 56 \
    --freeze_embeds --label_smoothing 0.1 --adafactor --task summarization_xsum \
    --model_name_or_path "tuner007/pegasus_paraphrase" \
    --data_dir {data_dir} \
    --output_dir {output_dir} \
    --gpus 4 \
    --overwrite_output_dir

For the new run_summarization.py:

python tfmr4/run_summarization.py \
    --model_name_or_path "tuner007/pegasus_paraphrase" \
    --cache_dir $CACHE_DIR \
    --train_file $TRAIN_FILE \
    --validation_file $VAL_FILE \
    --test_file $TEST_FILE \
    --output_dir $MODEL_OUTPUT_DIR \
    --learning_rate=1e-4 \
    --num_train_epochs=1 \
    --per_device_train_batch_size=32 \
    --per_device_eval_batch_size=32 \
    --do_train \
    --do_predict \
    --max_source_length 512 \
    --max_target_length 56 \
    --label_smoothing 0.1 \
    --adafactor \
    --overwrite_output_dir

There were a couple configs that exists in finetune.py but no longer in run_summarization.py. I will also look into all the possible configurations for the two scripts and spot any difference.