How to reduce Inference API costs for long format text generation?

bencooper · June 11, 2021, 3:42am

Hi, I’m considering building out some apps using the Inference API, with GPT-Neo and maybe GPT-2. It seems really awesome. Something I realized very fast is how quickly you could rack up a huge bill using the API though, especially when generating long format text. For example if you use the “Question Answering” example from https://huggingface.co/blog/few-shot-learning-gpt-neo-and-inference-api, each request is a little bit more than 1,000 characters. Which for the supporter plan ($25/M input characters), would equal out to about two and a half cents per generation, which for any app would add up SUPER quick. If a user of your app did 10 generations, it would be 25 cents! If that user used your apps every week, it would be a dollar per month, per user. And that’s assuming they don’t do more than ten generations, for some things only allowing ten generations wouldn’t be enough. To use it for any app that doesn’t have monthly fees from users would be impossible. Even if you eventually did switch to the Startup, $599/mo plan, it would still be 1 cent per generation, which would be way to much for most apps, even just having ads on your app wouldn’t cover it. All of the payment plans for the Inference API are way more expensive for long form generation than GPT-3’s API, even the Davinci model.

I’m not trying to complain here, the inference API is awesome. Just looking for maybe some tips on how to reduce input length, or maybe something I am overall missing. Or maybe the inference API wouldn’t be right for non SaaS/low cost SaaS apps. Or perhaps I am looking at this all from the wrong perspective. Anything would help! Thank you.

Topic		Replies	Views
How Can I Understand the Exact Cost of My Inference API Requests? Intermediate	2	138	April 16, 2025
Inference Providers: 3 cents per request? Beginners	4	338	March 12, 2025
Inference API Rate Limits Inference Endpoints on the Hub	1	78	May 16, 2025
Inference API detailed request Beginners	5	2260	September 11, 2020
Hugging Face Inference API returning short generated text with GPT-2 model Beginners	3	1715	July 18, 2023

How to reduce Inference API costs for long format text generation?

Related topics