Adapting a model from Spaces to Inference Endpoint

I have a custom inference pipeline working as a Spaces app: it look like there is no easy route to deploy as an Inference Endpoint. Is that correct?

I have seen the custom inference handler notes Phil posted, but would love more detail as my input and output is different than the one example given.

Thanks for any starting points or forum links I’ve overlooked…

Hello @plexus3d,

You can directly deploy your custom inference pipeline as an inference endpoint using a custom container. This would mean creating a model repository with your space code and then using a custom docker with gradio. Here is an example: philschmid/space-naver-donut-cord · Hugging Face.

Additionally, as you mentioned, you can create a custom inference handler. In the documentation, we have several examples on how to do this: * Optimum and ONNX Runtime

Or you can use the API from your Spaces

Thanks @philschmid for a characteristically fine response, with much detail. Your last line suggestion seems simplest – simply calling the existing gradio API in the existing Space.

At the risk of sidetracking, following your suggestion is there a way to meter Spaces via an API? That is, to set ‘running’ and ‘stopped’ states via code. I ask because I believe Spaces activate as ‘running’ and then charge 24/7 (for CPU/GPU upgrades). I’m looking to run a Space selectively. Any hints?

I am not sure about that one I think it is possible to programmatically change the Hardware but not sure about stopping. You could ask in #spaces