Identifying Original Models and Getting Parameter Counts

Using the HF API for listing models are the following possible?

  1. Identify if a model is an original model (Not a reupload of an existing model)
  2. Get a parameter count for each model.

I am hoping to run some analysis based on this information.

Identify if a model is an original model (Not a reupload of an existing model)

For example, the hash of each file can be listed as standard, but it is difficult to determine whether it is original or not. In many cases, the original is not in the HF in the first place, or has disappeared. (Even the SD 1.5 official has disappeared) At best, it can be estimated from the upload date and time and the number of downloads.

Get a parameter count for each model.

There is often an indication of whether the file is quantised or not, so if it is not quantised, it can be estimated from the file size.
However, in many cases, the repo-name is indicated as 8B or 70B in the first place.

If you have a request for a feature of the hub, you can rely on the following people

I see, my fall back plan was to just threshold results based on number of likes and downloads and assume that if they are popular enough then they are atleast genuine. (ill take less noise in place of perfection)

As for model size, I notice that the model card usually has a parameter count so thought it may be readily available somewhere in the API.

the model card usually has a parameter count

All the information on the card is available via the API; it was just a matter of setting CardData=True.
However, only a few cards are written accurately. If you are looking for an official, I think that would still work fine.

ahh I see. As for estimating via file size, is it possible to do this without needed to pull the files? - where can I find information on quantisation?

is it possible to do this without needed to pull the files?

Possible. As well as the file hash, a filename can also be obtained.
Files larger than 10 GB are split up, which is a bit tricky, but the filenames and their rules are predefined, so it is manageable.

where can I find information on quantisation?

The right-hand side of this screen is clearly lined with data interpreted by the server. Perhaps this information is also available in the API. But come to think of it, I have never checked if I can actually get this…

1 Like