Inference without gradient computation?

Hi all,
I have some very inefficient code that just runs once, so it doesn’t matter to me for now. It’s just inference of about 14k data points.

My code looks like that:

tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
# loop over dataset:
inputs = tokenizer(preprocess_sentence(train_data['notes'][i]), return_tensors="pt", max_length=510)
sentence_vector = model(**inputs).pooler_output

I recognized that I run out of memory very quickly and put the inference part into a with torch.no_grad(): block, which helps.

Is it generally the case like in all other Pytorch models to run the inference with model.eval() and with torch.no_grad():? I haven’t found any reference in the Transformers lib docs about that and thought it might be really useful for newbies.