It’d be super helpful to require or at least suggest that AI models include system requirements, just like other software. Minimum and recommended specs, especially for inference with Hugging Face libraries, would make things easier. Hardware info is often hard to find, and not everyone has access to H100 clusters. Setting this as a standard would make models way more accessible.
HF is looking for requests for the site. Some of the suggestions have already been implemented. In the case of generative AI models, for example, it is difficult to determine the exact recommended value because the amount of VRAM required may be reduced by a quarter due to quantization, but even if there is such ambiguity, it would be easier if we had an approximate value.
Thanks for that link!
I’ve added the suggestion there too link to suggestion post
If you think it’s a good idea, add thumbs up, might get more exposure
I’m sure we could settle on some baseline, it could be the requirements for non-quantized default model inference.
As you say having an approximate value would indeed be helpful.
If we have baseline we can fairly well estimate other variants.
Of course I gave it a thumbs up. I try to gather as many people’s opinions and suggestions as possible, but HF gets the most accurate opinions if many people suggest them directly, not indirectly. There are actually quite a few features that are easy if you have an idea. Some features are harder than others…
Quantization is usually up to 4-bit quantization, so as long as the reference value is displayed, all we have to do is simple division and multiplication. Recently, 2.5-bit quantization has been introduced, but it is an exception.
At the moment, the capacity of a model with 16-bit float stored is almost the amount of VRAM required for the model. You can also get a rough estimate from the number of parameters in the model, such as 8B or 3B. But you have to know it, and it’s a pain to remember. It’s like miles and yards for non-Americans. We’d be happier if they were displayed in meters.
When you say “capacity” do you mean the size the model weights take up on the storage or something else?
What’s the equation to calculate based on parameters? Does this apply only to transformers or also to other architectures such us diffusers?
the size the model weights take up on the storage
Yes.
What’s the equation to calculate based on parameters?
This also applies to Diffusers, but since SDXL and Flux, for example, are architectures that combine multiple model structures, they are seldom simply expressed in terms of the number of parameters. Instead, each model architecture has a fixed capacity.
SD1.5 has a variable capacity of approximately 2 GB, SDXL has 7 GB, and Flux and SD3.5 have a little over 30 GB at 16 bits.
The following Posts will help you with the formula.