How does organization billing work for API usage?

I’m trying to understand how organization billing works. First some context. We are using an open source model and do not host anything in the organization. Using an API and pay for usage seems ideal. A credit card was added to the organization but not the user accounts.

There are two usage patterns:

  1. Low volume requests from a developer.
  2. High volume requests from a batch job. This is in the range of 10’s of thousands of requests.

The Inference API is the first in the list when deploying a model. The pricing page does not mention anything about the Inference API. This could be because it’s advertised as “free”. Is there a paid tier? After much digging, I found the Inference for PROs blog where it mentions increased rate limits and a few other benefits over the “free” tier. Can an organization pay for a user’s PRO account? How does that work? Note: The pricing page does not mention multiple Inference API tiers. Also, the Accelerated Inference API does not mention the PRO version. This is very confusing!

The Inference API section about Parallelism and batch jobs mentions Spaces, so let’s talk about Spaces. The Spaces overview and Spaces page imply Spaces is meant for demos and portfolio apps. This goes against the implication that Spaces can be used by batch jobs. If Spaces is suitable for batch jobs, how does the billing work? Org API keys are deprecated. Will Spaces know to use an organization billing if the user billing is empty? How does that work?

The Spaces GPU Upgrades documentation has a section on billing but it doesn’t state anything about organization billing. The Billing documentation hints that organization billing is only for Enterprise Hub subscriptions since “PRO subscription” is “for users” and “Enterprise Hub subscriptions” are “for organizations”. If I’m only interested in using the API (to access hosted models), is it necessary to subscribe to Enterprise Hub for the sole purpose of moving the billing from the user to the organization?

I haven’t even gotten to Inference Endpoints which requires managing IAM permission for EC2 instances created by Hugging Face. I don’t want to deal with the EC2 permissions, and if I can use an API to access a hosted model AND have the organization pay for usage, that would be ideal. Any insight you can provide is greatly appreciated.

No reply usually means nobody knows. I’ll try emailing them. Does anyone know the email address for billing?

Hi,

Inference solutions

Currently there are 2 inference solutions, namely the free inference API and Inference Endpoints. Inference API is free and rate limited, aimed at playing around/demo’ing with a machine learning model. It only supports native models of the Transformers, Timm and Diffusers libraries (see the docs here). For production use cases, we refer to Inference Endpoints, which provides an interface to deploy a model from the hub easily with a few clicks. In that case, you get billed for the compute. Inference Endpoints allows to deploy any (custom) model, which could be a Hugging Face model that you fine-tuned or a sklearn model for instance.

We also offer a PRO account, which allows to have higher rate limits when using the free inference API: Inference for PROs.

Billing always happens through a Hugging Face account, which could be an individual person or organization. There’s no need to upgrade to the Enterprise Hub to get billed for your organization. See the billing settings in your profile settings: Billing. At the bottom, you can view the billing of the organizations that your user account is part of.

See also: Change Organization or Account for billing of Inference Endpoints.

Spaces

Spaces are a hosting environment for machine learning demos, typically built using Gradio or Streamlit. We now also support Docker, hence you could deploy any Docker container on it. Oftentimes, you need some stronger hardware to showcase your model, which is why we offer Spaces hardware. This includes both CPUs and GPUs, priced per hour that they run. Again here, billing can happen through your personal user account or at the organization level, depending on the account at which you host the Space.

Spaces are indeed not meant for batch jobs. Spaces are meant for showcasing a machine learning demo.

Enterprise Hub

The Enterprise Hub offers a way to collaboratively work on machine learning within an organization (see it as Github/Gitlab but for machine learning), with guaranteed security regarding hosting. It allows you to use the Hugging Face hub as you’re used too, but it allows to also host private ML models/demos, on which teams can collaborate. See Enterprise Hub - Hugging Face.

Let me know if you have additional questions :slight_smile: we also have a support email at api-enterprise@huggingface.co if you have further questions

Thank you for your response.

We also offer a PRO account, which allows to have higher rate limits when using the free inference API: Inference for PROs .

When I went to sign up for a PRO account, it asked me for the credit card information even though the billing information is there for the organization. This implies PRO accounts are for users only, not organizations. Therefore, the “higher rate limits” for the inference API provided by a PRO plan cannot be paid for by organizations. Is this correct?

I still need to work through the Inference Endpoint to understand how that works.

Inference API is free and rate limited, aimed at playing around/demo’ing with a machine learning model.

This is not true. When using the Inference API, I recieved the following error. It seems the Inference API is free and rate limited for some models; other models require a PRO account.

Model requires a Pro subscription; check out hf.co/pricing to learn more. Make sure to include your HF token in your query.

Searching for the error message led me to this post from September 2023.

We’ll release some docs on this soon. At the moment, only Llama 2 chat models require PRO.

I checked the docs and it does not state anything about “Llama 2” requiring a PRO subscription plan. Hugging Face’s idea of “soon” is more than several months.

This brings me back to my original question which is answered indirectly by lack of an answer: Organizations cannot pay for PRO accounts for developers.

From an organizational standpoint, the only way to develop with Llama 2 models is to upgrade my personal account using the oranization CC and then add my personal access token as a secret to GitHUb Secrets. This is not a desirable path, nor is it clearly explained in the docs or the forums.

Let me know if you have additional questions :slight_smile: we also have a support email at api-enterprise@huggingface.co if you have further questions

I never received a response from api-enterprise@huggingface.co that I sent 3 weeks ago.

To elaborate on Llama 2 requiring PRO subscription:

yes to use it with inference api, you need pro subscription since its too large(13gbish>10gb which is free api limit). Ofcourse you could run it locally without any error

I could not find reference to this in the docs. To state it differently, the “free” Inference API is rate limited (number of requests?), available only for models < 10GB and (possibly) excludes all Llama 2 models.