Resuming accelerate-based pretraining with different batch size

TL;DR; Any tips on adding epochs doing MLM pretraining via accelerator with different batch sizes?

This is to get tips on working around the issues mentioned in the following Github Q&A item and feature request:
how to continue training from a checkpoint with Trainer? 
issue warning about different batch size being used for --resume_from_checkpoint

My problem is that I want to resume a costly masked language model (MLM) pretraining run under an AWS 4-gpu server. The run completed one epoch with a low batch size, and I want to do a few more with a larger batch size for better throughput.

This is not supported in the code due to the way the code re-adjusts the resume step point to account for the different batch size:
transformers/ at main · huggingface/transformers · GitHub

A workaround I implemented was to reset the step to 0 after the code does its calibration. I did this because I couldn’t find comparable support for last checkpoint restarting from the trainer-based version of the MLM script:
transformers/ at main · huggingface/transformers · GitHub