Hi everyone, I’m currently using facebook/bart-large-mnli in order to generate the categorization of a phrase passed in input.
I’ve realized that the hugging face inference endpoint manage request synchronously and this is a big problem from what I want to do, counting the fact that the infrastructure needs to support hundreds of users.
Do you know any alternative which supports concurrent requests? Thanks!