How to use FSDP + DPP in Trainer

Hi - I want to train a model with [e.g. 256 GPU]. I want to have 4 data parallelism (DDP) to replicate the full model, and in each parallelism use FSDP to shard the model into 64 GPUs. Any code example?

I know how to write it in a native Pytorch but how to do this in Trainer. Is it supportive?

Did you figure it out?