Let's add system requirements to model publications

ljhwild · October 26, 2024, 12:31pm

It’d be super helpful to require or at least suggest that AI models include system requirements, just like other software. Minimum and recommended specs, especially for inference with Hugging Face libraries, would make things easier. Hardware info is often hard to find, and not everyone has access to H100 clusters. Setting this as a standard would make models way more accessible.

John6666 · October 26, 2024, 12:50pm

HF is looking for requests for the site. Some of the suggestions have already been implemented. In the case of generative AI models, for example, it is difficult to determine the exact recommended value because the amount of VRAM required may be reduced by a quarter due to quantization, but even if there is such ambiguity, it would be easier if we had an approximate value.

ljhwild · October 26, 2024, 1:27pm

Thanks for that link!
I’ve added the suggestion there too link to suggestion post
If you think it’s a good idea, add thumbs up, might get more exposure

I’m sure we could settle on some baseline, it could be the requirements for non-quantized default model inference.
As you say having an approximate value would indeed be helpful.
If we have baseline we can fairly well estimate other variants.

John6666 · October 26, 2024, 1:39pm

Of course I gave it a thumbs up. I try to gather as many people’s opinions and suggestions as possible, but HF gets the most accurate opinions if many people suggest them directly, not indirectly. There are actually quite a few features that are easy if you have an idea. Some features are harder than others…

Quantization is usually up to 4-bit quantization, so as long as the reference value is displayed, all we have to do is simple division and multiplication. Recently, 2.5-bit quantization has been introduced, but it is an exception.
At the moment, the capacity of a model with 16-bit float stored is almost the amount of VRAM required for the model. You can also get a rough estimate from the number of parameters in the model, such as 8B or 3B. But you have to know it, and it’s a pain to remember. It’s like miles and yards for non-Americans. We’d be happier if they were displayed in meters.

ljhwild · October 26, 2024, 6:05pm

When you say “capacity” do you mean the size the model weights take up on the storage or something else?
What’s the equation to calculate based on parameters? Does this apply only to transformers or also to other architectures such us diffusers?

John6666 · October 27, 2024, 12:50am

the size the model weights take up on the storage

Yes.

What’s the equation to calculate based on parameters?

This also applies to Diffusers, but since SDXL and Flux, for example, are architectures that combine multiple model structures, they are seldom simply expressed in terms of the number of parameters. Instead, each model architecture has a fixed capacity.
SD1.5 has a variable capacity of approximately 2 GB, SDXL has 7 GB, and Flux and SD3.5 have a little over 30 GB at 16 bits.

The following Posts will help you with the formula.

Topic		Replies	Views
How can I search for models, sorted in order of required vram? Site Feedback	0	402	January 28, 2023
Determining if a model will run locally Beginners	4	476	April 7, 2025
Resource required to fine tune a large model? Beginners	0	398	November 12, 2022
Which model is best for code generation under [b]10GB[/b] Beginners	4	806	June 20, 2025
Which solution is best suited in my case? Beginners	2	67	October 17, 2024

Let's add system requirements to model publications

Related topics