Hi all,
I have some very inefficient code that just runs once, so it doesn’t matter to me for now. It’s just inference of about 14k data points.
My code looks like that:
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT")
# loop over dataset:
inputs = tokenizer(preprocess_sentence(train_data['notes'][i]), return_tensors="pt", max_length=510)
sentence_vector = model(**inputs).pooler_output
I recognized that I run out of memory very quickly and put the inference part into a with torch.no_grad():
block, which helps.
Is it generally the case like in all other Pytorch models to run the inference with model.eval()
and with torch.no_grad():
? I haven’t found any reference in the Transformers lib docs about that and thought it might be really useful for newbies.