BgeM3 SKU - Microsoft Azure

I’ve been told to reach out toe HF Form from Microsoft ML Team regarding SKU Enablement for Bgem3 embedding model

Information attached below on serving specifics:

We would like to enable the following SKUs for the model:
Standard NC4as T4 v3 (NVIDIA T4, 16 GB)
Standard NV6ads A10 v5 (fractional NVIDIA A10, ≈ 12 GB)

Let me give you more information on this request below:

This BGE model is roughly ~1B parameters. This will take up about 1G of VRAM. Embedding computations can spike the VRAM consumption, but never even close to the A100 specs (Standard_NC24ads_A100_v424 cores, 220GB RAM, 64GB storage). 64 gigs is significantly more than we would need, even with constant inferencing (which isn’t our use case). We prefer partitions of NV-series T4‑based (e.g., Standard_NV12s v3) would suffice as our use case involves <4GB but none of these seem customizable, even an L4 or A10 would be overkill. That is just an additional reason as we wish to pay by inference costs of running this embedding model. However, if you could enable this option potentially for the smaller GPUs, we can proceed via a cost-effective hourly deployment route.

1 Like