Paid API Service

sbmaruf · December 29, 2022, 11:48pm

Is it possible to run a paid inference API at huggingface for large LLMs? Like I want to evaluate a few custom datasets on BLOOM/OPT models. I understand that it’s not possible for huggingface to provide thousands of free API calls. But can I just pay for it and get inference output on publicly available models at huggingface?

julien-c · December 30, 2022, 8:33pm

cc @jeffboudier

jeffboudier · December 30, 2022, 8:57pm

Thanks for asking! To evaluate models beyond the free tier of the Inference API, you can use our paid inference solution Inference Endpoints and deploy any model on dedicated infrastructure for your use (billed by capacity, not by requests). Note that for >10Bn models the available hardware instances may not fit the model (e.g. BLOOM 176Bn, OPT 175Bn), which may require you to request quota for large instances, or a custom quote / deployment.

sbmaruf · December 31, 2022, 2:22am

Thanks a lot for the fast reply.

It seems to me that right now it’s not possible to deploy the 176B model according to this page. Or is there any other way we can do that?

I can already see that the BLOOM is deployed at azure, bigscience/bloom · Hugging Face.
Is it on CPU?

Right now as a researcher, I only need inference results for a few datasets which may not require more than a couple of hours each. Is it ok to request a custom quote/deployment for that?

Note that I want to host “google/flan-t5-xxl” (11B), “bigscience/bloomz” (176B) or OPT (175B) (depending on perf.). But not sure which device to use since A100 nodes are not available and judgding by the p4de.24xlarge’s availability at aws, not sure when we will get that.

sbmaruf · January 4, 2023, 6:56am

@jeffboudier Any suggestion on the device selection for “google/flan-t5-xxl” (11B), “bigscience/bloomz” (176B) and OPT (175B) models?

jeffboudier · January 5, 2023, 2:45pm

Hi - for 11B model you can check out this tutorial to run T5-11B on a T4 via Inference Endpoints: Deploy T5 11B for inference for less than $500

BLOOMZ and OPT would require a 8xA100 80GB (e.g. p4de.24xlarge on AWS) instance which wouldn’t make sense to setup / grant for a couple hours, maybe your usage falls within the free limit of the Inference API? Azure is sponsoring free inference for BLOOM (not BLOOMZ) in the Inference API - hosted on a 8xA100 80GB.

sbmaruf · January 6, 2023, 2:42am

Thanks a lot for the suggestion. @jeffboudier

Topic		Replies	Views
PRO Plan and for running huge models on free inference api? Beginners	1	1812	May 15, 2023
Inference service for large models, such as Vicuna 13b Beginners	0	1431	May 5, 2023
Cannot run large models using API token Inference Endpoints on the Hub	5	7317	February 22, 2024
Performance of hosted inference API Beginners	0	296	February 16, 2021
Anyone else VERY confused? Community Calls	1	1266	December 19, 2023

Paid API Service

Related topics