Simultaneous processing of multi-queries to the LLM model

I’m trying to keep the LLM model loaded in memory and organize queue-less request processing. I want the model to process several requests simultaneously, up to 10 requests every second. I realize that this requires a very powerful server. But the main question is: is it possible? Does any model support multithreading? Are there any open source models, like Mistral from OpenOrca, that support processing multiple requests simultaneously? I will be grateful for any advice!

1 Like