Allow Multiple Processes at Once

Dupaja · January 2, 2024, 1:40am

Hello! New to HF here, and fairly new to ML.

I put together (through a fair amount of trial and error) a handler.py for Microsoft’s SpeechT5 TTS, on a copy I made.

However, it seems to only process one connection at once, and the other waits. It looks like, on the Inference Endpoint, it is only using a single core, and 2-3 GB out of 16 GB available at a time. Is there a way to allow this to use multiple cores / instances?

https://huggingface.co/Dupaja/speecht5_tts/blob/main/handler.py is the file, for reference.

Thanks!

Topic		Replies	Views
Realtime speech-to-text solution? Beginners	1	995	July 24, 2024
Raise Inference Client GB Limit Inference Endpoints on the Hub	3	117	July 20, 2024
Conversational Memory with HF inference endpoints Inference Endpoints on the Hub	0	342	February 1, 2024
Serverless Inference API [error 500] Inference Endpoints on the Hub	2	62	January 23, 2025
Dedicated inference endpoint failing to initialize Inference Endpoints on the Hub	0	40	November 21, 2024

Allow Multiple Processes at Once

Related topics