Docs Clarification: Is prepare() inefficient for models that are frozen?

aswerdlow · January 22, 2024, 4:18pm

The docs specify “You don’t need to prepare a model if you only use it for inference without any kind of mixed precision” but make it unclear whether there’s any difference or inefficiency added here.

For example, I have an nn.Module that I will perform distributed training with in bfloat16, but it has several child components (themselves nn.Modules) that are frozen for pure inference.

In this case, is there a difference (specifically w.r.t. performance) between accelerator.prepare(model.trainable_submodel) and accelerator.prepare(model), besides having to manually move to device/dtype? Obviously the 2nd is more convenient but does this wrapper cause any performance loss (in my case I’m using simple DDP).

I saw this related question but it doesn’t directly address this.

Thanks in advance!

Topic		Replies	Views
Data Parallel Multi GPU Inference 🤗Accelerate	9	4669	September 15, 2023
Can I call prepare() separately on multiple models or should it be a single call? Beginners	0	200	February 26, 2024
Accelerator OOM 🤗Accelerate	2	1270	July 5, 2023
The point of using pretrained model if I don't freeze layers Beginners	1	8505	May 31, 2023
Inflated GPU memory footprint of model prepared via accelerate 🤗Accelerate	5	764	September 15, 2023

Docs Clarification: Is prepare() inefficient for models that are frozen?

Related topics