The docs specify “You don’t need to prepare a model if you only use it for inference without any kind of mixed precision” but make it unclear whether there’s any difference or inefficiency added here.
For example, I have an nn.Module that I will perform distributed training with in bfloat16, but it has several child components (themselves nn.Modules) that are frozen for pure inference.
In this case, is there a difference (specifically w.r.t. performance) between accelerator.prepare(model.trainable_submodel)
and accelerator.prepare(model)
, besides having to manually move to device/dtype? Obviously the 2nd is more convenient but does this wrapper cause any performance loss (in my case I’m using simple DDP).
I saw this related question but it doesn’t directly address this.
Thanks in advance!