I see that PyTorch/XLA FSDP is supported using the Trainer API as described here:
But what if I’m using the accelerate API instead of Trainer? When I run accelerate config and I specify TPUs as the platform, I don’t see any of the options to configure FSDP that I see when I specify multi-GPU as the platform. So, that seems to imply that accelerate doesn’t currently support FSDP on TPUs. Does that mean if use accelerate on a TPU pod that the parallelization strategy is just plain old (non-sharded) Data Parallelism? That’s a non-starter for large transformer models since the complete model isn’t going to fit on a single TPU.
Bottom Line: If I want to use FSDP to train on a TPU pod, does that mean I’m forced to use the Trainer API instead of accelerate?