Initializing a big model on GPU with random weights

j-klesen · January 14, 2025, 10:16am

Hi all

to test and debug our PEFT script for very large models (405B), we want to circumvent downloading the whole model on every rerun, in order to reduce experimentation time (and thus, cost). Downloading once and storing is not an option, unfortunately.

To this end, we want to have the model loaded as if it was properly initialized (using e.g. from_pretrained), i.e. with the correct dtype and distributed across the gpus/nodes, and so on, just with random weights for now, and skipping the download.

When using something like

mconfig = AutoConfig.from_pretrained(train_config.model_name)
model = LlamaForCausalLM(mconfig)

the model gets loaded to CPU and the RAM runs out and we get an OOMKilled error.

So, ideally, we would want to use something like

model = LlamaForCausalLM.from_pretrained(
    train_config.model_name, \
    device_map= "auto", \
    torch_dtype=torch.float16, # or torch.bfloat16,
    random_weights=True
)

since the from_pretrained function seems to handle all the complexities we need.

Is there any easy way to do this? We checked the parameters of from_pretrained, but did not find something applicable.

Thank you for any ideas,
Jonas

John6666 · January 14, 2025, 10:37am

device=“meta”?

j-klesen · January 14, 2025, 10:57am

Hi @John6666 ,
thanks for the suggestion! But the meta device doesn’t quite cut it for us.

We need to have real weights on the devices, because we e.g. also want to do forward/backward passes and measure time taken, memory load, etc.

Any other ideas?

Best,
Jonas

Topic		Replies	Views
How to initialize a model with random weights Beginners	3	834	October 28, 2024
How to load the finetuned model (merged weights) on colab? 🤗Transformers	1	1491	November 27, 2023
Loading a trained model gives an error that weights are randomly initialized 🤗Transformers	0	468	June 6, 2023
Trainer API weights initialization 🤗Transformers	2	58	February 10, 2025
Unable to load a FineTuned LLama Model to GPU for inference Beginners	3	2971	December 15, 2023

Initializing a big model on GPU with random weights

Related topics