How to calculate the price of hosting transformers for semantic search

Abdelkareem · December 10, 2023, 8:54am

I have been asked from my manager to discuss the cost of building the following
Semantic search engine for car images and car parts
Semantic search but for words not images
Recommendation system
I love qdrant and want to use it in the production, for our business case i must declare some baseline for the projects before starting working on them
I don’t have enough experience with cloud computing, so can someone give any guidance to manage this discussion, i mainly want to use qdrant but their cloud price for my region and the manager i work is really high!!!

panigrah · December 10, 2023, 10:07am

How big is your dataset?
And how are you creating the embeddings, I.e. converting your images and docs to vectors.
will you be using existing model or training/finetuning one for creating embeddings
Then you layer in things like redundancy, replication, performance, security which should be priced into the cloud offering to decide if it’s “worth” it.

Abdelkareem · December 10, 2023, 12:22pm

The dataset is not known! but awsome for semantic search for image they are 1M images and for semantic search for text i will just host an mbert model
the embedding i will use CLIP for images semantic search and for text i will use the following model medmediani/Arabic-KW-Mdel · Hugging Face
use models from huggingface
i want to store the embedding in vector database like qdarnt
the info i know for now is
. Let’s say, you’ve 1M images and 1M text pieces.

Let’s assume, we use something like CLIP Embedding for this → 768 dimensions

For each 1M, to have the lowest latency possible i.e. have everything in RAM – you’d want to index about 2.86 GB of vectors. So leaving some room for the index: We’d need about ˜5 GB of RAM.

Assume you’re using a 8G RAM machine on AWS us-east-1, this’d be $0.0504 hourly on the most expensive end of things.

With better configs around: Storing some payload on disk, you can use a 4G machine – or $0.0385 hourly

This is still on the more expensive end of things, and if you can share a bit more about how many RPS you’d need – we can do a more accurate pricing discussion with you
I think for storing the embedding i know the cost but want to know your answer and for hosting the models in a server i don’t know which one and which gpu should i use …etc any information or guidance will help me alot!

Topic		Replies	Views
Pricing for Huggingface Endpoint Inference Endpoints on the Hub	6	3314	February 5, 2025
Cost to fine tune large transformer models on the cloud? Beginners	1	1520	November 29, 2021
Text input bigger than max tokens length for semantic search embeddings Beginners	1	1586	May 29, 2024
Cost Prediction of nvidia nim nv-embed-v1 Models	0	255	July 15, 2024
Calculate costs for multiple models in same machine Inference Endpoints on the Hub	0	294	September 5, 2023

How to calculate the price of hosting transformers for semantic search

Related topics