BgeM3 SKU - Microsoft Azure

BGEM3-Embedder · April 28, 2025, 5:50pm

I’ve been told to reach out toe HF Form from Microsoft ML Team regarding SKU Enablement for Bgem3 embedding model

Information attached below on serving specifics:

We would like to enable the following SKUs for the model:
Standard NC4as T4 v3 (NVIDIA T4, 16 GB)
Standard NV6ads A10 v5 (fractional NVIDIA A10, ≈ 12 GB)

Let me give you more information on this request below:

This BGE model is roughly ~1B parameters. This will take up about 1G of VRAM. Embedding computations can spike the VRAM consumption, but never even close to the A100 specs (Standard_NC24ads_A100_v424 cores, 220GB RAM, 64GB storage). 64 gigs is significantly more than we would need, even with constant inferencing (which isn’t our use case). We prefer partitions of NV-series T4‑based (e.g., Standard_NV12s v3) would suffice as our use case involves <4GB but none of these seem customizable, even an L4 or A10 would be overkill. That is just an additional reason as we wish to pay by inference costs of running this embedding model. However, if you could enable this option potentially for the smaller GPUs, we can proceed via a cost-effective hourly deployment route.

Topic		Replies	Views
Deployment of HF models in azure ML studio not available in azure model catalog Models	0	44	July 22, 2024
Local HW specs for Hosting meta-llama/Llama-3.2-11B-Vision-Instruct 🤗Transformers	4	1723	October 28, 2024
Best LLMs that can run on 4gb VRAM Beginners	2	3443	January 22, 2025
Private models for Azure endpoint? 🤗Hub	0	676	June 10, 2022
Can you set max_new_tokens in Azure ML studio Azure ML Studio Model Catalog	0	478	October 23, 2023

BgeM3 SKU - Microsoft Azure

Related topics