using the same model inputs
kwargs = {
"input_ids":input_ids,
"attention_mask":attention_mask,
}
i get a CUDA out of memory when i do
outputs = model(**kwargs)
but not when i do
output_ids = model.generate(
**kwargs,
do_sample=False,
max_new_tokens=max_output_len,
pad_token_id=tokenizer.eos_token_id,
top_p=None,
)
doesnt model.generate do a model(**kwargs)
like operation several times? why is my version so memory inefficient?
model is microsoft/phi-1_5
and transformers.version 4.40.1
Hi!
The reason is because on model.forward()
you will have gradients being calculated unless you do:
with torch.no_grad():
model.forward(**inputs)
The generate
method already has the no gradient decorator so it does not use much memory
1 Like
ah of course! thank you raushan! the community delivers
as a follow up tho, would model.eval()
not do the same thing? as
with torch.no_grad():
outputs = model.forward(**inputs)
print(outputs)
No, model.eval()
is used to transform model layers that to eval mode, i.e. batchnorm or dropout layers.
And torch.no_grad()
deactivates gradient calculation, saving memory usage
1 Like