Tensor shape mismatch error when doing an allgather in distributed training with FSDP

I think FSDP is in charge of the accelerate library (or PyTorch itself), but the function that was being executed when the error occurred is from the Transformers library, so I think there is no problem with the Transformers library github.

1 Like