Organization Pricing

Regarding: I’d like to understand the configuration of the Start Up for Organizations.

I’ve a language model finetuned on top of the pretrained GPT2 - Small. When I deployed the
LanguageModel in AWS - Sagemaker, Google Colab, HuggingFace below are observed.

From the above tabular column can you please help me in knowing ? values

Hello @bala1802 .

Do you mind sharing your testing scripts ? Inference time can widely vary depending on the input, and the actual parameters used to generate the text.

1/ Did you use use_gpu flag to actually use the GPU on the inference ?

I’m seeing 6s inference on my test string.

curl -X POST -d '{"inputs": "toto", "options": {"use_gpu": true, "use_cache": false}}' https://api-inference.huggingface.co/models/balawmt/LanguageModel_Trial_1 -H "Authorization: Bearer ${HF_API_TOKEN}" -D -

2/ first time vs second time, should not really make a difference , are you trying 2 different payloads ?

3/ The actual run time of a query on a text-generation pipeline can depend on the EOS token being generated randomly (otherwise it will simply generate max_tokens which seems to be set to 500 for your model). So when trying to test inference time, you need to make sure that you are generating the same number of tokens, and that EOS cannot be generated.

Hope that helps.