Hello there, :slight_smile: I am having trouble optimizing my inference translation pipeline. The bottleneck seems to be beam_search using only 1 cpu whereas 1 gpu and 16 cpus are available. Here is an overview of cpu usage thanks to py-spy: [image] I have looked at the code to try to understan…

Beam_search bottlenecks inference with only 1 used cpu

adelplace October 13, 2022, 12:06pm 2

It seems like I am not the only one facing this problem :

Any ideas of solution ?

Topic		Replies	Views
Model.generate() is extremely slow while using beam search 🤗Transformers	2	5436	July 24, 2022
Multiple gpu not properly parallelized during model.generate() 🤗Transformers	4	1640	October 9, 2022
How to make model.generate() process using multiple CPU cores? 🤗Transformers	2	307	February 10, 2025
How to parallel infer multiple input sentences with beam search = 4? 🤗Transformers	0	25	October 20, 2024
Beam search does not reach the stopping criteria and causes cuda oom 🤗Transformers	1	288	November 5, 2024