Increasing Response time for Gradio api

DataImaginations · September 6, 2024, 10:51pm

Oh, i didnt realise you had a pro account. Have you tried ‘Spaces Hardware’?
That will likely have much faster speeds, and the payment plan for the CPU/GPU’s has a sleep mode, so when its not being used, there’s no GPU cost.
Other things to keep in mind:
-caching to avoid repeated inferences if there will be similar ones

load balances: to split up the inferencig over resources,
Edge: In terms of inferencing, you shouldnt need anything too high end if there isn’t going to be a large number of inferencing at once. I switch hosting between my 4090 and my 4070 super portable PC, and yes there is a difference, but its not as bad as you’d expect. You could get away with a lot less if it was a small model like a llama/Gemma 3’ishB. If you go that route, run it from Workbench WSL if you have an NVIDEA card (just in the WSL, not via workbench), or use any ubuntu WSL, a docker or something like a POP’OS partition.

Topic		Replies	Views
Deploy model on HF Space for production Spaces	0	991	March 11, 2022
Gradio spaces app error Spaces	7	4016	August 18, 2023
Need help with deploying my model on spaces Spaces	1	142	November 21, 2024
Space is displaying infinitely loading while status is "Running" Spaces	1	2004	June 1, 2022
Deploying models onto spaces Beginners	2	1320	August 1, 2023

Increasing Response time for Gradio api

Related topics