Inference without gradient computation?

ThetaPhiPsi · February 6, 2022, 8:57am

Hi all,
I have some very inefficient code that just runs once, so it doesn’t matter to me for now. It’s just inference of about 14k data points.

My code looks like that:

tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
# loop over dataset:
inputs = tokenizer(preprocess_sentence(train_data['notes'][i]), return_tensors="pt", max_length=510)
sentence_vector = model(**inputs).pooler_output

I recognized that I run out of memory very quickly and put the inference part into a with torch.no_grad(): block, which helps.

Is it generally the case like in all other Pytorch models to run the inference with model.eval() and with torch.no_grad():? I haven’t found any reference in the Transformers lib docs about that and thought it might be really useful for newbies.

ThetaPhiPsi · January 4, 2023, 2:43pm

I happened to find my own question again by searching for some similar question.

To all of you that come here for an answer.

Yes, for inference you should set the model top eval mode with model.eval(), and
Yes, if you have resource issues or want to keep them low also go with torch.no_grad():

Example code:

# set model to eval mode
model.eval()
inputs = tokenizer(…)

# set torch to no grad to save resources
with torch.no_grad():
    outputs = model(**inputs)
…

Innovator2K · December 26, 2024, 6:59am

Thank you for the answers!
It’s very strange that I haven’t found anything about this while reading about training and fine-tuning in the HuggingFace’s NLP Course.

Topic		Replies	Views
CUDA OOM on model(inputs) but not on model.generate(inputs), but doesn't generate use model(inputs)? Intermediate	4	249	May 4, 2024
Why is the tensor produced by inference so big? Beginners	2	431	April 17, 2023
Model is not properly moved to GPU memory with torch.no_grad() Beginners	5	4800	August 24, 2022
Using Trainer at inference time 🤗Transformers	9	15897	May 4, 2023
Tfmodelforquestionanswering in eval mode 🤗Transformers	2	332	October 29, 2020

Inference without gradient computation?

Related topics