How to speed up inference across a list of strings?

FuriouslyAsleep · December 5, 2021, 8:42am

Hello. How can one speed up inference across a list of strings?

To be specific, I’m not (yet) interested in speeding up inference on any one pass of the loop below. I am interested in if it’s possible to run different elements of list MylistOfStrings through the loop at the same time on different CPUs (order of output is not important; I don’t have GPU access). I do not have experience with multiprocessing / parallelization code. Is this even possible across multiple CPUs?

model = AutoModelWithLMHead.from_pretrained('T5-small')
tokenizer = AutoTokenizer.from_pretrained('T5-small')
        
    for s in MylistOfStrings:
        input_sent = 'translate: ' + s +' </s>'
        
	    inputs = tokenizer.encode(input_sent, return_tensors="pt", max_length=512)
        outputs = model.generate(inputs, max_length=150, min_length=40, length_penalty=2.0, num_beams=4, early_stopping=True)
        MySent =  ' '.join([tokenizer.decode(ids) for ids in outputs])
        MyOutputArr = MyOutputArr + MySent
    
print(MyOutputArr)

Thank you for any assistance.

Topic		Replies	Views
Using model.generate() in parrellel / faster? Beginners	0	363	October 11, 2023
Fastest way to do inference on a large dataset in huggingface? 🤗Datasets	5	3302	May 3, 2024
When I try to inference on multiple GPUs using multiple processes, the time for model. generate() becomes very long 🤗Transformers	0	475	June 12, 2023
GPU inference slows down if done in a loop 🤗Transformers	1	1573	July 20, 2020
How to generate on multiple GPU's Intermediate	3	1850	August 30, 2022

How to speed up inference across a list of strings?

Related topics