How to not show the progress bar for evaluation only?

baek26 · April 17, 2024, 3:06pm

I am now training summarization model with nohup bash ~
since nohup writes all the tqdm logs, the file size increases too much. I am fine with some data mapping or training logs. but, there are some too long logs in between the training logs.

Now I am using trainer from transformer and wandb.

I can’t identify what this progress bar is…
the code snippet is here


    if args.do_train:
        wandb.init(name=f"{model_name_only}-data:{args.dataset}:{args.train_size}-{random_num}", project=f'{model_name_only}-{args.train_rl_size}-{random_num}', settings=wandb.Settings(_service_wait=3000))
        print('train bart..')
        seq2seq_data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)
        print(f'output directory: {output_dir}')
        training_args = Seq2SeqTrainingArguments(
            output_dir=args.output_model_dir,
            num_train_epochs=args.epoch,
            warmup_steps=500,
            per_device_train_batch_size=args.train_batch_size,
            per_device_eval_batch_size=args.test_batch_size,
            weight_decay=0.01,
            logging_steps=500,
            evaluation_strategy='steps',
            eval_steps=500,
            save_steps=1e6,
            predict_with_generate=True,
            remove_unused_columns=True,
            hub_model_id=output_dir.split('/')[-1],
            push_to_hub=args.push_to_hub,
            gradient_accumulation_steps=16
        )

        trainer = Seq2SeqTrainer(
            model=model,
            args=training_args,
            tokenizer=tokenizer,
            data_collator=seq2seq_data_collator,
            train_dataset=train_dataset,
            eval_dataset=eval_dataset,
            compute_metrics=compute_metrics,
        )
        
        trainer.train()
        print('done')

        if args.push_to_hub:
            trainer.save_model(output_dir)
            print(f'save model to {output_dir}')
            trainer.push_to_hub()
            print('push model to hub')
    
    if args.do_rl:

and now is the nohup log…

Map:  93%|█████████▎| 4612/4969 [00:04<00:00, 1227.55 examples/s]
Map:  96%|█████████▌| 4778/4969 [00:04<00:00, 1317.31 examples/s]
Map: 100%|██████████| 4969/4969 [00:04<00:00, 1108.47 examples/s]
Map: 100%|██████████| 4969/4969 [00:04<00:00, 1130.69 examples/s]
wandb: Currently logged in as: baek26. Use `wandb login --relogin` to force relogin
wandb: wandb version 0.16.6 is available!  To upgrade, please run:
wandb:  $ pip install wandb --upgrade
wandb: Tracking run with wandb version 0.16.4
wandb: Run data is saved locally in /hdd/hdd2/baek26/Ours/MDO/wandb/run-20240417_001446-a6ortut6
wandb: Run `wandb offline` to turn off syncing.
wandb: Syncing run facebook-bart-base-data:all:None-323
wandb: ⭐️ View project at https://wandb.ai/baek26/facebook-bart-base-None-323
wandb: 🚀 View run at https://wandb.ai/baek26/facebook-bart-base-None-323/runs/a6ortut6
/home/guest-bje/.local/share/virtualenvs/Ours-2zhE1riw/lib/python3.8/site-packages/accelerate/accelerator.py:432: FutureWarning: Passing the following arguments to `Accelerator` is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches', 'even_batches', 'use_seedable_sampler']). Please pass an `accelerate.DataLoaderConfiguration` instead: 
dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)
  warnings.warn(
Detected kernel version 4.15.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
done..
train bart..
output directory: ./checkpoints/MDO/all_323_bart-base

  0%|          | 0/49760 [00:00<?, ?it/s]
  0%|          | 1/49760 [00:06<83:29:06,  6.04s/it]
  0%|          | 2/49760 [00:08<56:15:16,  4.07s/it]
  0%|          | 3/49760 [00:11<47:42:26,  3.45s/it]
  0%|          | 4/49760 [00:14<43:31:26,  3.15s/it]
  0%|          | 5/49760 [00:16<41:31:17,  3.00s/it]
  0%|          | 6/49760 [00:19<40:14:46,  2.91s/it]
  0%|          | 7/49760 [00:22<39:29:00,  2.86s/it]
  0%|          | 8/49760 [00:25<38:44:46,  2.80s/it]
  0%|          | 9/49760 [00:27<38:36:48,  2.79s/it]
  0%|          | 10/49760 [00:30<38:23:16,  2.78s/it]
                                                     

  0%|          | 10/49760 [00:30<38:23:16,  2.78s/it]
  0%|          | 11/49760 [00:33<38:11:39,  2.76s/it]
  0%|          | 12/49760 [00:35<37:44:02,  2.73s/it]
  0%|          | 13/49760 [00:38<37:30:46,  2.71s/it]
  0%|          | 14/49760 [00:41<37:18:40,  2.70s/it]
...
1%|          | 491/49760 [23:04<36:14:18,  2.65s/it]
  1%|          | 492/49760 [23:07<36:11:57,  2.65s/it]
  1%|          | 493/49760 [23:09<36:15:00,  2.65s/it]
  1%|          | 494/49760 [23:12<36:11:42,  2.64s/it]
  1%|          | 495/49760 [23:14<36:08:02,  2.64s/it]
  1%|          | 496/49760 [23:17<36:06:57,  2.64s/it]
  1%|          | 497/49760 [23:20<36:14:44,  2.65s/it]
  1%|          | 498/49760 [23:23<36:33:00,  2.67s/it]
  1%|          | 499/49760 [23:25<36:58:12,  2.70s/it]
  1%|          | 500/49760 [23:28<37:10:19,  2.72s/it]
                                                      

  1%|          | 500/49760 [23:28<37:10:19,  2.72s/it]/home/guest-bje/.local/share/virtualenvs/Ours-2zhE1riw/lib/python3.8/site-packages/transformers/generation/utils.py:1178: UserWarning: Using the model-agnostic default `max_length` (=20) to control the generation length. We recommend setting `max_new_tokens` to control the maximum length of the generation.
  warnings.warn(
{'loss': 9.467, 'grad_norm': 24.954456329345703, 'learning_rate': 1.0000000000000002e-06, 'epoch': 0.0}
{'loss': 8.9265, 'grad_norm': 16.096410751342773, 'learning_rate': 2.0000000000000003e-06, 'epoch': 0.0}
{'loss': 8.3355, 'grad_norm': 11.878803253173828, 'learning_rate': 3e-06, 'epoch': 0.01}
{'loss': 7.7632, 'grad_norm': 10.747937202453613, 'learning_rate': 4.000000000000001e-06, 'epoch': 0.01}
{'loss': 7.2561, 'grad_norm': 9.314805030822754, 'learning_rate': 5e-06, 'epoch': 0.01}
{'loss': 6.9627, 'grad_norm': 12.294201850891113, 'learning_rate': 6e-06, 'epoch': 0.01}
{'loss': 6.4995, 'grad_norm': 16.7220401763916, 'learning_rate': 7.000000000000001e-06, 'epoch': 0.01}
{'loss': 5.8852, 'grad_norm': 22.519121170043945, 'learning_rate': 8.000000000000001e-06, 'epoch': 0.02}
{'loss': 5.0685, 'grad_norm': 23.565885543823242, 'learning_rate': 9e-06, 'epoch': 0.02}
{'loss': 4.5524, 'grad_norm': 22.735116958618164, 'learning_rate': 1e-05, 'epoch': 0.02}
{'loss': 4.1538, 'grad_norm': 22.052845001220703, 'learning_rate': 1.1000000000000001e-05, 'epoch': 0.02}
...
{'loss': 1.1735, 'grad_norm': 1.2184255123138428, 'learning_rate': 4.8e-05, 'epoch': 0.1}
{'loss': 1.2038, 'grad_norm': 1.1376460790634155, 'learning_rate': 4.9e-05, 'epoch': 0.1}
{'loss': 1.2235, 'grad_norm': 1.3213053941726685, 'learning_rate': 5e-05, 'epoch': 0.1}


  0%|          | 0/3777 [00:00<?, ?it/s]e[A

  0%|          | 2/3777 [00:00<10:46,  5.84it/s]e[A

  0%|          | 3/3777 [00:00<20:53,  3.01it/s]e[A

  0%|          | 4/3777 [00:01<21:45,  2.89it/s]e[A

  0%|          | 5/3777 [00:01<28:40,  2.19it/s]e[A

  0%|          | 6/3777 [00:02<25:43,  2.44it/s]e[A

  0%|          | 7/3777 [00:02<24:17,  2.59it/s]e[A

  0%|          | 8/3777 [00:03<24:52,  2.53it/s]e[A

  0%|          | 9/3777 [00:03<28:24,  2.21it/s]e[A

  0%|          | 10/3777 [00:04<29:12,  2.15it/s]e[A

  0%|          | 11/3777 [00:04<26:26,  2.37it/s]e[A
...
99%|█████████▉| 3754/3777 [22:35<00:17,  1.31it/s]e[A

 99%|█████████▉| 3755/3777 [22:35<00:13,  1.62it/s]e[A

 99%|█████████▉| 3756/3777 [22:36<00:13,  1.55it/s]e[A

 99%|█████████▉| 3757/3777 [22:37<00:14,  1.35it/s]e[A

 99%|█████████▉| 3758/3777 [22:38<00:13,  1.41it/s]e[A

100%|█████████▉| 3759/3777 [22:38<00:10,  1.77it/s]e[A

100%|█████████▉| 3760/3777 [22:39<00:10,  1.56it/s]e[A

100%|█████████▉| 3761/3777 [22:40<00:11,  1.38it/s]e[A

100%|█████████▉| 3762/3777 [22:40<00:10,  1.49it/s]e[A

100%|█████████▉| 3763/3777 [22:40<00:07,  1.85it/s]e[A

100%|█████████▉| 3764/3777 [22:41<00:07,  1.85it/s]e[A

100%|█████████▉| 3765/3777 [22:42<00:07,  1.56it/s]e[A

100%|█████████▉| 3766/3777 [22:43<00:07,  1.44it/s]e[A

100%|█████████▉| 3767/3777 [22:43<00:05,  1.76it/s]e[A

100%|█████████▉| 3768/3777 [22:43<00:04,  1.91it/s]e[A

100%|█████████▉| 3769/3777 [22:44<00:05,  1.59it/s]e[A

100%|█████████▉| 3770/3777 [22:45<00:05,  1.38it/s]e[A

100%|█████████▉| 3771/3777 [22:45<00:03,  1.70it/s]e[A

100%|█████████▉| 3772/3777 [22:46<00:02,  2.05it/s]e[A

100%|█████████▉| 3773/3777 [22:47<00:02,  1.61it/s]e[A

100%|█████████▉| 3774/3777 [22:48<00:02,  1.31it/s]e[A

100%|█████████▉| 3775/3777 [22:48<00:01,  1.65it/s]e[A

100%|█████████▉| 3776/3777 [22:49<00:00,  1.40it/s]e[A

100%|██████████| 3777/3777 [22:50<00:00,  1.50it/s]e[A

                                                   
e[A
                                                      

100%|██████████| 3777/3777 [23:31<00:00,  1.50it/s]e[A
  1%|          | 500/49760 [47:00<37:10:19,  2.72s/it]

                                                   e[A
  1%|          | 501/49760 [47:04<5838:06:57, 426.67s/it]
  1%|          | 502/49760 [47:07<4100:50:18, 299.71s/it]
  1%|          | 503/49760 [47:10<2881:12:24, 210.58s/it]
  1%|          | 504/49760 [47:13<2027:26:41, 148.18s/it]
  1%|          | 505/49760 [47:15<1429:52:12, 104.51s/it]
  1%|          | 506/49760 [47:18<1011:33:41, 73.94s/it] 
  1%|          | 507/49760 [47:20<718:43:33, 52.53s/it

From this example, you can notice the lines with 3777 generates one empty line for one log and it is generated too frequently… It is so annoying. What is it? and how can I fix it?

madslun · April 24, 2024, 4:03pm

Hi,

You can override the “on_prediction_step” of the ProgressCallback class and customize what you want logged.

Here is the method as it is, but you could remove or edit what you would like.

class ProgressOverider(ProgressCallback):
    def on_prediction_step(self, args, state, control, eval_dataloader=None, **kwargs):
        if state.is_world_process_zero and has_length(eval_dataloader):
            if self.prediction_bar is None:
                self.prediction_bar = tqdm(
                    total=len(eval_dataloader), dynamic_ncols=True
                )
            self.prediction_bar.update(1)

You then have to remove the previous callback and your new to your trainer.

    progress_callback = next(filter(lambda x: isinstance(x, ProgressCallback), trainer.callback_handler.callbacks),
                                 None)
    trainer.remove_callback(progress_callback)
    trainer.add_callback(ProgressOverider)

Hope this helps!

Topic		Replies	Views
Progress bars shown despite disable_tqdm=True in Trainer Beginners	2	7885	May 4, 2023
How to disable tqdm progress bar when reloading a locally saved fine tuned model Beginners	0	3067	June 27, 2022
How can I disable log history from getting printed every logging_steps 🤗Transformers	0	614	February 8, 2024
HF Trainer progress bar not progressing after first epoch 🤗Transformers	0	1992	May 10, 2023
How to log Trainer's training progress bars into a file 🤗Transformers	2	1785	December 5, 2024

How to not show the progress bar for evaluation only?

Related topics