How can I get the current iteration number using accelerate?

Johnrs · July 24, 2023, 3:43pm

I want to use data batch iteration number to save model or print logs. But I find that I can’t get the total iteration number across multi-gpu devices. It seems to only get the iteration number on single gpu. My code is:

for epoch in range(start_epoch, total_epoch + 1):
    if resume_state and epoch == start_epoch and current_iter is not None:
        # We need to skip steps until we reach the resumed step
        active_dataloader = accelerator.skip_first_batches(train_loader, current_iter)
    else:
        active_dataloader = train_loader
    for idx, train_data in enumerate(active_dataloader):
        data_timer.record()
        current_iter += 1 # get total iteration number
        model.feed_data(train_data)
        model.optimize_parameters()
        ...
        if current_iter % opt['logger']['print_freq'] == 0:
            log_vars = {'epoch': epoch, 'iter': current_iter}
            log_vars.update({'lrs': model.get_current_learning_rate()})
            log_vars.update({'time': iter_timer.get_avg_time(), 'data_time': data_timer.get_avg_time()})
            log_vars.update(model.get_current_log())
            msg_logger(log_vars)
        # save models and training states
        if current_iter % opt['logger']['save_checkpoint_freq'] == 0:
            logger.info('Saving models and training states.')
            model.save_state(epoch, current_iter)

I am not sure if I should use “current_iter += 1” to get total iteration number?

Topic		Replies	Views
Getting GPU info from Accelerate 🤗Accelerate	6	2143	July 6, 2022
What is the right way to save check point using accelerator while trainining on multiple gpus? 🤗Accelerate	2	1916	January 24, 2024
How to collect the accuracy when running multi GPU model with accelerate? 🤗Accelerate	3	983	December 8, 2023
Question about calculating training loss of multi-GPU with Accelerate 🤗Accelerate	1	865	July 20, 2024
Performing gradient accumulation with Accelerate 🤗Accelerate	3	575	March 4, 2024

How can I get the current iteration number using accelerate?

Related topics