Hi,
I’m using the almost-new model.parallelize
feature for T5 models, and at some point, I need to save my model checkpoint and load it later to continue training. How can I do these together? With model.load_state_dict
everything loads on the device I provide, but how can it be loaded on several devices like it’s done in model.parallelize
?