How do Inference Endpoints fit into larger solution?

stats-ai · June 17, 2023, 5:02pm

I am not new to ML/NLP, but I am new to HuggingFace and am trying to piece together what is what when it comes to deployment of a solution, rather than just a model.

All tutorials/docs I have read concentrate on deploying a model using Inference Endpoints or Sagemaker – they all assume that the response from the model is all there is to the solution (e.g., get sentiment analysis from this blurb of text).

However, in my case, I am using a model as just one part of an entire solution, which also includes some post-processing and database querying after NLP tasks. In this case, I would not only need to deploy the model, but also the database and all other python code needed for a comprehensive solution.

In such a case, it seems like I would need, at a minimum:

reverse proxy to hide the secret tokens and limit bogus traffic
deployed python code that contains custom algorithms and calls to the model and database
deployed model
deployed database

I really really want to limit the complexity of doing all of this. So, Inference Endpoints seems worth the extra cost for the convenience it gives you. However, if I am also deploying a database and additional python code as well as a proxy server, I am worried that if HuggingFace does not provide those things, then there will be latency issues between the python code and calling the model. I wonder if that is mitigated at all if you deploy the python and database code to the same cloud provider that you use for Inference Endpoints (AWS or Azure)?

Does anyone have a recommendation on how best to deploy all of this? Before learning about Inference Endpoints, I was assuming I would have to package all of this into one or more Docker images and deploy to something like Cloud Run. But as I said, I would rather not spend a lot of time and effort figuring out devops and how to package things up and deploy them.

Any help would be greatly appreciated!

Topic		Replies	Views
Can inference endpoints be used in Spaces? Inference Endpoints on the Hub	1	879	September 25, 2023
HF Inference Endpoints don't finish Initializing Inference Endpoints on the Hub	0	236	March 28, 2024
About the Inference Endpoints on the Hub category Inference Endpoints on the Hub	3	1654	May 8, 2025
How to deploy a space on inference endpoint for autoscaling? Inference Endpoints on the Hub	0	347	August 23, 2023
Inference Endpoints / Model choices / Help Inference Endpoints on the Hub	1	25	July 10, 2025

How do Inference Endpoints fit into larger solution?

Related topics