I have some very inefficient code that just runs once, so it doesn’t matter to me for now. It’s just inference of about 14k data points.
My code looks like that:
tokenizer = AutoTokenizer.from_pretrained("emilyalsentzer/Bio_ClinicalBERT") model = AutoModel.from_pretrained("emilyalsentzer/Bio_ClinicalBERT") # loop over dataset: inputs = tokenizer(preprocess_sentence(train_data['notes'][i]), return_tensors="pt", max_length=510) sentence_vector = model(**inputs).pooler_output
I recognized that I run out of memory very quickly and put the inference part into a
with torch.no_grad(): block, which helps.
Is it generally the case like in all other Pytorch models to run the inference with
with torch.no_grad():? I haven’t found any reference in the Transformers lib docs about that and thought it might be really useful for newbies.