Inference without gradient computation?

Hi all,
I have some very inefficient code that just runs once, so it doesn’t matter to me for now. It’s just inference of about 14k data points.

My code looks like that:

tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
# loop over dataset:
inputs = tokenizer(preprocess_sentence(train_data['notes'][i]), return_tensors="pt", max_length=510)
sentence_vector = model(**inputs).pooler_output

I recognized that I run out of memory very quickly and put the inference part into a with torch.no_grad(): block, which helps.

Is it generally the case like in all other Pytorch models to run the inference with model.eval() and with torch.no_grad():? I haven’t found any reference in the Transformers lib docs about that and thought it might be really useful for newbies.

1 Like

I happened to find my own question again by searching for some similar question. :grinning:

To all of you that come here for an answer.

  1. Yes, for inference you should set the model top eval mode with model.eval(), and
  2. Yes, if you have resource issues or want to keep them low also go with torch.no_grad():

Example code:

# set model to eval mode
model.eval()
inputs = tokenizer(…)

# set torch to no grad to save resources
with torch.no_grad():
    outputs = model(**inputs)
…
4 Likes

Thank you for the answers!
It’s very strange that I haven’t found anything about this while reading about training and fine-tuning in the HuggingFace’s NLP Course.

1 Like