CUDA OOM on model(inputs) but not on model.generate(inputs), but doesn't generate use model(inputs)?

clam004 · May 4, 2024, 3:10pm

using the same model inputs

kwargs = {
     "input_ids":input_ids,
     "attention_mask":attention_mask,
 }

i get a CUDA out of memory when i do

outputs = model(**kwargs)

but not when i do

output_ids = model.generate(
    **kwargs,
    do_sample=False,
    max_new_tokens=max_output_len,
    pad_token_id=tokenizer.eos_token_id,
    top_p=None,
)

doesnt model.generate do a model(**kwargs) like operation several times? why is my version so memory inefficient?

model is microsoft/phi-1_5 and transformers.version 4.40.1

RaushanTurganbay · May 4, 2024, 7:42pm

Hi!

The reason is because on model.forward() you will have gradients being calculated unless you do:

with torch.no_grad():
    model.forward(**inputs)

The generate method already has the no gradient decorator so it does not use much memory

clam004 · May 4, 2024, 7:56pm

ah of course! thank you raushan! the community delivers

clam004 · May 4, 2024, 7:59pm

as a follow up tho, would model.eval() not do the same thing? as

with torch.no_grad():
    outputs = model.forward(**inputs)

print(outputs)

RaushanTurganbay · May 4, 2024, 8:27pm

No, model.eval() is used to transform model layers that to eval mode, i.e. batchnorm or dropout layers.
And torch.no_grad() deactivates gradient calculation, saving memory usage

Topic		Replies	Views
Inference without gradient computation? 🤗Transformers	2	7078	December 26, 2024
How memory is managed in model.generate() method? 🤗Transformers	2	47	September 27, 2024
Llama-2 CUDA OOM during inference but not training Models	2	592	July 10, 2024
Model is not properly moved to GPU memory with torch.no_grad() Beginners	5	4803	August 24, 2022
Why does moving ML model initialization into a function prevent GPU OOM errors when del, gc.collect(), and torch.cuda.empty_cache() fail? Beginners	0	99	December 5, 2024

CUDA OOM on model(inputs) but not on model.generate(inputs), but doesn't generate use model(inputs)?

Related topics