This is an interesting question! I do not think increasing the batch size while using the optimizer states from the previous training run would be an issue. I think you could either go with the old optimizer states or just start with fresh gradient statistics both would work imo.
Just worst case if you’re limited on compute, you could start with a small learning rate for a bit and then increase once you’re confident you have better gradient statistics. I haven’t seen this as a formal recommendation anywhere but it’s more just like a worse case might as well