Initializing a big model on GPU with random weights

Hi all :hugs:

to test and debug our PEFT script for very large models (405B), we want to circumvent downloading the whole model on every rerun, in order to reduce experimentation time (and thus, cost). Downloading once and storing is not an option, unfortunately.

To this end, we want to have the model loaded as if it was properly initialized (using e.g. from_pretrained), i.e. with the correct dtype and distributed across the gpus/nodes, and so on, just with random weights for now, and skipping the download.

When using something like

mconfig = AutoConfig.from_pretrained(train_config.model_name)
model = LlamaForCausalLM(mconfig)

the model gets loaded to CPU and the RAM runs out and we get an OOMKilled error.

So, ideally, we would want to use something like

model = LlamaForCausalLM.from_pretrained(
    train_config.model_name, \
    device_map= "auto", \
    torch_dtype=torch.float16, # or torch.bfloat16,
    random_weights=True
)

since the from_pretrained function seems to handle all the complexities we need.

Is there any easy way to do this? We checked the parameters of from_pretrained, but did not find something applicable.

Thank you for any ideas,
Jonas

1 Like

device=“meta”?

Hi @John6666 ,
thanks for the suggestion! But the meta device doesn’t quite cut it for us.

We need to have real weights on the devices, because we e.g. also want to do forward/backward passes and measure time taken, memory load, etc.

Any other ideas?

Best,
Jonas

1 Like