Unable to Process Concurrent User Request

ankitsingi007 · December 1, 2020, 6:43am

I am using the https://huggingface.co/ktrapeznikov/biobert_v1.1_pubmed_squad_v2/tree/main#
for the question answering model. I am sending around 50 abstracts from Pubmed and then asking the question. It is working fine for 1 user , but when scale to 10 concurrent users the model is taking too long. Can anybody help

FL33TW00D · December 1, 2020, 9:19am

There’s a number of avenues that you could use to reduce inference time:

Scale your deployment vertically or horizontally.
Move to a smaller model.
Improve your preprocessing/postprocessing efficiency.

Topic		Replies	Views
Deploying LLM in Production: Performance Degradation with Multiple Users 🤗Transformers	6	4744	June 7, 2024
Question-Answering using BiopGPT-Large-PubMedQA Models	0	171	October 12, 2023
Getting 429 Error for sentence-transformers/all-mpnet-base-v2 Beginners	1	229	September 23, 2024
Cannot execute any model with my API Token, models are timed out Inference Endpoints on the Hub	6	2859	May 1, 2025
Using SciBERT pretrained model to build a question answering model using scientific research papers as dataset Beginners	0	575	May 10, 2022

Unable to Process Concurrent User Request

Related topics