Hello. How can one speed up inference across a list of strings?
To be specific, I’m not (yet) interested in speeding up inference on any one pass of the loop below. I am interested in if it’s possible to run different elements of list MylistOfStrings through the loop at the same time on different CPUs (order of output is not important; I don’t have GPU access). I do not have experience with multiprocessing / parallelization code. Is this even possible across multiple CPUs?
model = AutoModelWithLMHead.from_pretrained('T5-small')
tokenizer = AutoTokenizer.from_pretrained('T5-small')
for s in MylistOfStrings:
input_sent = 'translate: ' + s +' </s>'
inputs = tokenizer.encode(input_sent, return_tensors="pt", max_length=512)
outputs = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)
MySent = ' '.join([tokenizer.decode(ids) for ids in outputs])
MyOutputArr = MyOutputArr + MySent
print(MyOutputArr)
Thank you for any assistance.