I’m guessing no because FSDP has a very specific way that it distributes the model over the GPUs and device_map=‘auto’ might not align with that. Is my understanding correct? Is that why training usually does not work with device_map=‘auto’?