Sort models by parameter count

idkthrowaway123123 · August 28, 2024, 4:55am

I’m trying to find out what my poor 4070 TI can handle in terms of fine-tuning power for LLMs. I’ve currently narrowed it down to the fact GPT2-Medium (345m) is fine but GPT2-Large (774m) is too much . I’m still looking to fine tune it, therefore it would be very helpful if you could sort models by parameter size on the website, so I can filter quickly. At this moment I had to look very long to discover Bloom has a 560m variant!

That is all, thank you!

mahmutc · August 28, 2024, 8:03am

hi @idkthrowaway123123
I hope api can help you a bit about it:

https://huggingface.co/api/models?filter=safetensors returns all models with safetensors tag. And the following returns number of safetensors parameters:
https://huggingface.co/api/models/model_id_x

example:
https://huggingface.co/api/models/google-bert/bert-base-german-cased

You can add more tags and other parameters to filter better:
https://huggingface.co/api/models?pipeline_tag=fill-mask&filter=safetensors

I know it’s not exactly what you want

mahmutc · August 28, 2024, 8:40am

Better solution with Hub Python Library:

Filtering models:

List repo files:

Get file url:

And get file metadata (which contains size):

idkthrowaway123123 · August 29, 2024, 12:40am

Nice! I’m new to this place so didn’t know this yet, but i will check that out. Thanks a bunch!

mahmutc · August 29, 2024, 5:33pm

You can add filter option for list_models and change/add condition for file size.

from huggingface_hub import list_models, list_repo_files, hf_hub_url, get_hf_file_metadata

models = list_models()

for model in models:
  files = list_repo_files(model.id)
  for file in files:
    if file[-4:]==".bin":
      file_size = get_hf_file_metadata(hf_hub_url(model.id,file)).size
      if file_size // 1000000 > 1000: # if file bigger than 1gb
        print(model)
        print(f"{file_size // 1000000} mb")
        print("https://huggingface.co/"+model.id)

You can filter by #download/likes etc…

ModelInfo(id='google-bert/bert-large-uncased', author=None, sha=None, created_at=datetime.datetime(2022, 3, 2, 23, 29, 4, tzinfo=datetime.timezone.utc), last_modified=None, private=False, gated=None, disabled=None, downloads=2524614, downloads_all_time=None, likes=108, library_name='transformers', tags=['transformers', 'pytorch', 'tf', 'jax', 'rust', 'safetensors', 'bert', 'fill-mask', 'en', 'dataset:bookcorpus', 'dataset:wikipedia', 'arxiv:1810.04805', 'license:apache-2.0', 'autotrain_compatible', 'endpoints_compatible', 'region:us'], pipeline_tag='fill-mask', mask_token=None, card_data=None, widget_data=None, model_index=None, config=None, transformers_info=None, siblings=None, spaces=None, safetensors=None)

Topic		Replies	Views
Show model sizes when browsing models? Site Feedback	2	1524	November 28, 2023
How to get model size? Models	6	48479	July 15, 2023
Model size search/filter Site Feedback	1	3021	March 23, 2024
Hugging face API for querying models metadata Models	3	1222	February 8, 2024
Identifying Original Models and Getting Parameter Counts 🤗Hub	5	610	September 19, 2024

Sort models by parameter count

Related topics