How to not show the progress bar for evaluation only?

I am now training summarization model with nohup bash ~
since nohup writes all the tqdm logs, the file size increases too much. I am fine with some data mapping or training logs. but, there are some too long logs in between the training logs.

Now I am using trainer from transformer and wandb.

I can’t identify what this progress bar is…
the code snippet is here


    if args.do_train:
        wandb.init(name=f"{model_name_only}-data:{args.dataset}:{args.train_size}-{random_num}", project=f'{model_name_only}-{args.train_rl_size}-{random_num}', settings=wandb.Settings(_service_wait=3000))
        print('train bart..')
        seq2seq_data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)
        print(f'output directory: {output_dir}')
        training_args = Seq2SeqTrainingArguments(
            output_dir=args.output_model_dir,
            num_train_epochs=args.epoch,
            warmup_steps=500,
            per_device_train_batch_size=args.train_batch_size,
            per_device_eval_batch_size=args.test_batch_size,
            weight_decay=0.01,
            logging_steps=500,
            evaluation_strategy='steps',
            eval_steps=500,
            save_steps=1e6,
            predict_with_generate=True,
            remove_unused_columns=True,
            hub_model_id=output_dir.split('/')[-1],
            push_to_hub=args.push_to_hub,
            gradient_accumulation_steps=16
        )

        trainer = Seq2SeqTrainer(
            model=model,
            args=training_args,
            tokenizer=tokenizer,
            data_collator=seq2seq_data_collator,
            train_dataset=train_dataset,
            eval_dataset=eval_dataset,
            compute_metrics=compute_metrics,
        )
        
        trainer.train()
        print('done')

        if args.push_to_hub:
            trainer.save_model(output_dir)
            print(f'save model to {output_dir}')
            trainer.push_to_hub()
            print('push model to hub')
    
    if args.do_rl:

and now is the nohup log…

Map:  93%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Ž| 4612/4969 [00:04<00:00, 1227.55 examples/s]
Map:  96%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–Œ| 4778/4969 [00:04<00:00, 1317.31 examples/s]
Map: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4969/4969 [00:04<00:00, 1108.47 examples/s]
Map: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 4969/4969 [00:04<00:00, 1130.69 examples/s]
wandb: Currently logged in as: baek26. Use `wandb login --relogin` to force relogin
wandb: wandb version 0.16.6 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.16.4
wandb: Run data is saved locally in /hdd/hdd2/baek26/Ours/MDO/wandb/run-20240417_001446-a6ortut6
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run facebook-bart-base-data:all:None-323
wandb: ⭐️ View project at https://wandb.ai/baek26/facebook-bart-base-None-323
wandb: πŸš€ View run at https://wandb.ai/baek26/facebook-bart-base-None-323/runs/a6ortut6
/home/guest-bje/.local/share/virtualenvs/Ours-2zhE1riw/lib/python3.8/site-packages/accelerate/accelerator.py:432: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches', 'even_batches', 'use_seedable_sampler']). Please pass an `accelerate.DataLoaderConfiguration` instead: 
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)
  warnings.warn(
Detected kernel version 4.15.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
done..
train bart..
output directory: ./checkpoints/MDO/all_323_bart-base

  0%|          | 0/49760 [00:00<?, ?it/s]
  0%|          | 1/49760 [00:06<83:29:06,  6.04s/it]
  0%|          | 2/49760 [00:08<56:15:16,  4.07s/it]
  0%|          | 3/49760 [00:11<47:42:26,  3.45s/it]
  0%|          | 4/49760 [00:14<43:31:26,  3.15s/it]
  0%|          | 5/49760 [00:16<41:31:17,  3.00s/it]
  0%|          | 6/49760 [00:19<40:14:46,  2.91s/it]
  0%|          | 7/49760 [00:22<39:29:00,  2.86s/it]
  0%|          | 8/49760 [00:25<38:44:46,  2.80s/it]
  0%|          | 9/49760 [00:27<38:36:48,  2.79s/it]
  0%|          | 10/49760 [00:30<38:23:16,  2.78s/it]
                                                     

  0%|          | 10/49760 [00:30<38:23:16,  2.78s/it]
  0%|          | 11/49760 [00:33<38:11:39,  2.76s/it]
  0%|          | 12/49760 [00:35<37:44:02,  2.73s/it]
  0%|          | 13/49760 [00:38<37:30:46,  2.71s/it]
  0%|          | 14/49760 [00:41<37:18:40,  2.70s/it]
...
1%|          | 491/49760 [23:04<36:14:18,  2.65s/it]
  1%|          | 492/49760 [23:07<36:11:57,  2.65s/it]
  1%|          | 493/49760 [23:09<36:15:00,  2.65s/it]
  1%|          | 494/49760 [23:12<36:11:42,  2.64s/it]
  1%|          | 495/49760 [23:14<36:08:02,  2.64s/it]
  1%|          | 496/49760 [23:17<36:06:57,  2.64s/it]
  1%|          | 497/49760 [23:20<36:14:44,  2.65s/it]
  1%|          | 498/49760 [23:23<36:33:00,  2.67s/it]
  1%|          | 499/49760 [23:25<36:58:12,  2.70s/it]
  1%|          | 500/49760 [23:28<37:10:19,  2.72s/it]
                                                      

  1%|          | 500/49760 [23:28<37:10:19,  2.72s/it]/home/guest-bje/.local/share/virtualenvs/Ours-2zhE1riw/lib/python3.8/site-packages/transformers/generation/utils.py:1178: UserWarning: Using the model-agnostic default `max_length` (=20) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation.
  warnings.warn(
{'loss': 9.467, 'grad_norm': 24.954456329345703, 'learning_rate': 1.0000000000000002e-06, 'epoch': 0.0}
{'loss': 8.9265, 'grad_norm': 16.096410751342773, 'learning_rate': 2.0000000000000003e-06, 'epoch': 0.0}
{'loss': 8.3355, 'grad_norm': 11.878803253173828, 'learning_rate': 3e-06, 'epoch': 0.01}
{'loss': 7.7632, 'grad_norm': 10.747937202453613, 'learning_rate': 4.000000000000001e-06, 'epoch': 0.01}
{'loss': 7.2561, 'grad_norm': 9.314805030822754, 'learning_rate': 5e-06, 'epoch': 0.01}
{'loss': 6.9627, 'grad_norm': 12.294201850891113, 'learning_rate': 6e-06, 'epoch': 0.01}
{'loss': 6.4995, 'grad_norm': 16.7220401763916, 'learning_rate': 7.000000000000001e-06, 'epoch': 0.01}
{'loss': 5.8852, 'grad_norm': 22.519121170043945, 'learning_rate': 8.000000000000001e-06, 'epoch': 0.02}
{'loss': 5.0685, 'grad_norm': 23.565885543823242, 'learning_rate': 9e-06, 'epoch': 0.02}
{'loss': 4.5524, 'grad_norm': 22.735116958618164, 'learning_rate': 1e-05, 'epoch': 0.02}
{'loss': 4.1538, 'grad_norm': 22.052845001220703, 'learning_rate': 1.1000000000000001e-05, 'epoch': 0.02}
...
{'loss': 1.1735, 'grad_norm': 1.2184255123138428, 'learning_rate': 4.8e-05, 'epoch': 0.1}
{'loss': 1.2038, 'grad_norm': 1.1376460790634155, 'learning_rate': 4.9e-05, 'epoch': 0.1}
{'loss': 1.2235, 'grad_norm': 1.3213053941726685, 'learning_rate': 5e-05, 'epoch': 0.1}


  0%|          | 0/3777 [00:00<?, ?it/s]e[A

  0%|          | 2/3777 [00:00<10:46,  5.84it/s]e[A

  0%|          | 3/3777 [00:00<20:53,  3.01it/s]e[A

  0%|          | 4/3777 [00:01<21:45,  2.89it/s]e[A

  0%|          | 5/3777 [00:01<28:40,  2.19it/s]e[A

  0%|          | 6/3777 [00:02<25:43,  2.44it/s]e[A

  0%|          | 7/3777 [00:02<24:17,  2.59it/s]e[A

  0%|          | 8/3777 [00:03<24:52,  2.53it/s]e[A

  0%|          | 9/3777 [00:03<28:24,  2.21it/s]e[A

  0%|          | 10/3777 [00:04<29:12,  2.15it/s]e[A

  0%|          | 11/3777 [00:04<26:26,  2.37it/s]e[A
...
99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3754/3777 [22:35<00:17,  1.31it/s]e[A

 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3755/3777 [22:35<00:13,  1.62it/s]e[A

 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3756/3777 [22:36<00:13,  1.55it/s]e[A

 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3757/3777 [22:37<00:14,  1.35it/s]e[A

 99%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3758/3777 [22:38<00:13,  1.41it/s]e[A

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3759/3777 [22:38<00:10,  1.77it/s]e[A

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3760/3777 [22:39<00:10,  1.56it/s]e[A

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3761/3777 [22:40<00:11,  1.38it/s]e[A

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3762/3777 [22:40<00:10,  1.49it/s]e[A

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3763/3777 [22:40<00:07,  1.85it/s]e[A

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3764/3777 [22:41<00:07,  1.85it/s]e[A

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3765/3777 [22:42<00:07,  1.56it/s]e[A

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3766/3777 [22:43<00:07,  1.44it/s]e[A

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3767/3777 [22:43<00:05,  1.76it/s]e[A

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3768/3777 [22:43<00:04,  1.91it/s]e[A

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3769/3777 [22:44<00:05,  1.59it/s]e[A

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3770/3777 [22:45<00:05,  1.38it/s]e[A

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3771/3777 [22:45<00:03,  1.70it/s]e[A

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3772/3777 [22:46<00:02,  2.05it/s]e[A

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3773/3777 [22:47<00:02,  1.61it/s]e[A

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3774/3777 [22:48<00:02,  1.31it/s]e[A

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3775/3777 [22:48<00:01,  1.65it/s]e[A

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‰| 3776/3777 [22:49<00:00,  1.40it/s]e[A

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3777/3777 [22:50<00:00,  1.50it/s]e[A

                                                   
e[A
                                                      

100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 3777/3777 [23:31<00:00,  1.50it/s]e[A
  1%|          | 500/49760 [47:00<37:10:19,  2.72s/it]

                                                   e[A
  1%|          | 501/49760 [47:04<5838:06:57, 426.67s/it]
  1%|          | 502/49760 [47:07<4100:50:18, 299.71s/it]
  1%|          | 503/49760 [47:10<2881:12:24, 210.58s/it]
  1%|          | 504/49760 [47:13<2027:26:41, 148.18s/it]
  1%|          | 505/49760 [47:15<1429:52:12, 104.51s/it]
  1%|          | 506/49760 [47:18<1011:33:41, 73.94s/it] 
  1%|          | 507/49760 [47:20<718:43:33, 52.53s/it

From this example, you can notice the lines with 3777 generates one empty line for one log and it is generated too frequently… It is so annoying. What is it? and how can I fix it?

Hi,

You can override the β€œon_prediction_step” of the ProgressCallback class and customize what you want logged.

Here is the method as it is, but you could remove or edit what you would like.

class ProgressOverider(ProgressCallback):
    def on_prediction_step(self, args, state, control, eval_dataloader=None, **kwargs):
        if state.is_world_process_zero and has_length(eval_dataloader):
            if self.prediction_bar is None:
                self.prediction_bar = tqdm(
                    total=len(eval_dataloader), dynamic_ncols=True
                )
            self.prediction_bar.update(1)

You then have to remove the previous callback and your new to your trainer.

    progress_callback = next(filter(lambda x: isinstance(x, ProgressCallback), trainer.callback_handler.callbacks),
                                 None)
    trainer.remove_callback(progress_callback)
    trainer.add_callback(ProgressOverider)

Hope this helps!