Docs Clarification: Is prepare() inefficient for models that are frozen?

The docs specify “You don’t need to prepare a model if you only use it for inference without any kind of mixed precision” but make it unclear whether there’s any difference or inefficiency added here.

For example, I have an nn.Module that I will perform distributed training with in bfloat16, but it has several child components (themselves nn.Modules) that are frozen for pure inference.

In this case, is there a difference (specifically w.r.t. performance) between accelerator.prepare(model.trainable_submodel) and accelerator.prepare(model), besides having to manually move to device/dtype? Obviously the 2nd is more convenient but does this wrapper cause any performance loss (in my case I’m using simple DDP).

I saw this related question but it doesn’t directly address this.

Thanks in advance!