How to load a model and make in parallel (T5)

Hi,
I’m using the almost-new model.parallelize feature for T5 models, and at some point, I need to save my model checkpoint and load it later to continue training. How can I do these together? With model.load_state_dict everything loads on the device I provide, but how can it be loaded on several devices like it’s done in model.parallelize?