Organization Pricing

bala1802 · February 21, 2021, 9:26am

Regarding: I’d like to understand the configuration of the Start Up for Organizations.

I’ve a language model finetuned on top of the pretrained GPT2 - Small. When I deployed the
LanguageModel in AWS - Sagemaker, Google Colab, HuggingFace below are observed.

From the above tabular column can you please help me in knowing ? values

Narsil · February 22, 2021, 3:05pm

Hello @bala1802 .

Do you mind sharing your testing scripts ? Inference time can widely vary depending on the input, and the actual parameters used to generate the text.

1/ Did you use use_gpu flag to actually use the GPU on the inference ?

I’m seeing 6s inference on my test string.

curl -X POST -d '{"inputs": "toto", "options": {"use_gpu": true, "use_cache": false}}' https://api-inference.huggingface.co/models/balawmt/LanguageModel_Trial_1 -H "Authorization: Bearer ${HF_API_TOKEN}" -D -

2/ first time vs second time, should not really make a difference , are you trying 2 different payloads ?

3/ The actual run time of a query on a text-generation pipeline can depend on the EOS token being generated randomly (otherwise it will simply generate max_tokens which seems to be set to 500 for your model). So when trying to test inference time, you need to make sure that you are generating the same number of tokens, and that EOS cannot be generated.

Hope that helps.

Topic		Replies	Views
Different results with model hosted in HuggingFace and hosted in SageMaker Models	1	591	November 15, 2023
Estimating tokens per second Inference Endpoints on the Hub	3	8433	June 27, 2023
How Can I Understand the Exact Cost of My Inference API Requests? Intermediate	2	119	April 16, 2025
Anyone else VERY confused? Community Calls	1	1230	December 19, 2023
Inference Hyperparameters Amazon SageMaker	29	4824	October 8, 2021

Organization Pricing

Related topics