It loads a model onto multiple GPUs. Once loaded, the model can be run forward or backward. I have only used ”auto” for training as of yet, and it works.
If you refer to this section:
This only supports the inference of your model, not training. Most of the computation happens behind
torch.no_grad()
context managers to avoid spending some GPU memory with intermediate activations.
I think it only applies to the offloading to CPU or disk mechanism, but not when the full model can be loaded onto several GPUs.