What is the difference of the model api and pipeline api?

Hottuck · November 15, 2024, 9:14am

Hello,
I tested two APIs to utilize the BGE-M3 embedding:

"https://api-inference.huggingface.co/models/BAAI/bge-m3"
"https://api-inference.huggingface.co/pipeline/feature-extraction/BAAI/bge-m3"

When using the first API, there is a noticeable load time before receiving the response.
However, when using the second API, the results are returned immediately.

What is the difference between these two APIs?
Also, does the second API always guarantee immediate responses?

I couldn’t find any related documentation or comparison materials, so I’m asking for clarification.

encryptman · November 15, 2024, 9:53am

The first option is a serverless API for inference. This can be slow and may fail if the model is not loaded. Pipelines are often optimized for speed and may keep the model loaded in memory, allowing for immediate responses to requests. While the pipeline API generally provides quicker responses due to its optimized architecture, it does not guarantee immediate responses in all scenarios. Factors such as server load, network latency, and the complexity of the input data can still affect response times. Therefore, while you can expect faster performance with the pipeline, it may not always be instantaneous.

luc01234 · July 22, 2025, 6:33am

"https://api-inference.huggingface.co/models/BAAI/bge-m3": This endpoint is the more generic “model” inference endpoint. It’s designed to be flexible and support various tasks, depending on how the model is configured and the input you provide. It can be used for feature extraction, but it’s not specifically optimized for that task.

"https://api-inference.huggingface.co/pipeline/feature-extraction/BAAI/bge-m3": This endpoint is a dedicated “pipeline” endpoint specifically configured for feature extraction using the BAAI/bge-m3 model. It’s pre-configured for this task, so the Hugging Face Inference API knows exactly how to load the model and process your input.

Topic		Replies	Views
Incorrect response of the pipeline/feature-extraction endpoint for the bloom model Inference Endpoints on the Hub	0	931	September 29, 2022
Performance of hosted inference API Beginners	0	293	February 16, 2021
How can I change type of inference API Inference Endpoints on the Hub	2	642	January 5, 2023
Getting "No worker is available to serve request: model" with HuggingFaceModel endpoint Amazon SageMaker	13	5101	March 22, 2022
Difference between pinned models and Inference endpoints 🤗Hub	3	844	November 17, 2022

What is the difference of the model api and pipeline api?

Related topics