No loss being logged, when running MLM script (Colab)

Neel-Gupta · July 7, 2021, 11:52am

When using the run_MLM script and pairing with XLA, I am seeing that despite logging to files I still don’t get a step-by-step output of the metrics.

%%bash
python xla_spawn.py --num_cores=8 ./run_mlm.py --output_dir="./results" \
    --model_type="big_bird" \
    --config_name="./config" \
    --tokenizer_name="./tokenizer" \
    --train_file="./dataset.txt" \
    --validation_file="./val.txt" \
    --line_by_line="True" \
    --max_seq_length="16000" \
    --weight_decay="0.01" \
    --per_device_train_batch_size="1" \
    --per_device_eval_batch_size="1" \
    --learning_rate="3e-4" \
    --tpu_num_cores='8' \
    --warmup_steps="1000" \
    --overwrite_output_dir \
    --pad_to_max_length \
    --num_train_epochs="1" \
    --adam_beta1="0.9" \
    --adam_beta2="0.98" \
    --do_train \
    --do_eval \
    --logging_steps="10" \
    --evaluation_strategy="steps" \
    --eval_accumulation_steps='10' \
    --report_to="tensorboard" \
    --logging_dir='./logs' \
    --save_strategy="epoch" \
    --load_best_model_at_end='True' \
    --metric_for_best_model='accuracy' \
    --skip_memory_metrics='False'  \
    --gradient_accumulation_steps='500' \
    --use_fast_tokenizer='True' \
    --log_level='info' \
    --logging_first_step='True' \
    1> >(tee -a stdout.log) \
    2> >(tee -a stderr.log >&2)

As you can see, I am logging out stderr and stdout to files but I can see that it doesn’t log any step - only the end-of-epoch ones when training is finished. Using TensorBoard also doesn’t help when loss isn’t being logged anyways which is quite weird.

I have adjusted logging_steps but that doesn’t seem to help. I am quite confused - Trainer is supposed to ouput loss to the Cell output too, but that doesn’t happen either.

Does anyone know how I can log the metrics for ‘n’ steps?

Topic		Replies	Views
How to monitor both train and validation metrics at the same step? 🤗Transformers	21	15336	July 29, 2021
Wandb does not display train/eval loss except for last one Beginners	2	3662	March 4, 2022
Trainer not logging eval_loss Beginners	2	935	April 26, 2021
Trainer API to log both Training and Validation Metrics 🤗Transformers	2	1691	July 1, 2021
Trainer doesn't show the loss at each step 🤗Transformers	20	35814	May 9, 2024

No loss being logged, when running MLM script (Colab)

Related topics