Bug in models filtering by dataset?

arubique · March 13, 2025, 9:55am

Hello everyone,

I noticed a potential bug in the huggingface web interface.

I want to filter models by those pre-trained or fine-tuned on the specified dataset, however, I notice inconsistency in this filtering.

To demonstrate this let’s use imdb dataset. On the dataset page I can see the first 6 results of the mentioned filtering in the “Models trained or fine-tuned on stanfordnlp/imdb” section (please see the left part of the screenshot, left and right parts are separated by the vertical dashed line).

However, when I click the link “Browse 1407 models trained on this dataset” (it has the form of: https://huggingface.co/models?dataset=dataset:stanfordnlp/imdb), a search with an applied filter is opened. That search results only in 81 models (please see the right part of the screenshot).

I think it is a bug because the number of found models in the right part of the screenshot - 81 - is inconsistent with the 1407 models mentioned in the link title in the left part of the screenshot.

Could you please confirm whether it is a bug and suggest solutions that would allow me to see the names of all 1407 models mentioned in the left part of the screenshot (now I can see only 6 names that are explicitly shown there)?

Thank you in advance for your help!

John6666 · March 13, 2025, 2:48pm

I think that some of the datasets that can be referenced without an author name are divided into different names like this, whether it’s a bug in Hub or a feature.

arubique · March 13, 2025, 2:59pm

Oh, I see thanks! In this case with IMDB I should use dataset:imdb when filtering in addition to stanfordnlp/imdb used by default. Then I find 1326 more models in addition to the 81 models I found before when using stanfordnlp/imdb. Together they add up to 1326 + 81 = 1407 models mentioned on the dataset page. Now it makes sense, thank you!

I think that it is still a bug because there is an inconsistency between the number of models I find when following the link from the dataset page - 81 and the number of models written in the title of this link - 1407.

John6666 · March 13, 2025, 3:27pm

I think it’s a good issue to raise either of these. I don’t know if it’s a bug or a feature, but at the very least, it can’t be called the desired behavior…

system · March 14, 2025, 3:27am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Dataset curation extra parameters Beginners	2	31	January 19, 2025
Dataset 'imdb' doesn't exist on the Hub or cannot be accessed at revision 'tmp-fix-imdb' Beginners	2	626	January 8, 2025
Searching by dataset missing results 🤗Hub	3	63	November 26, 2024
Getting unexpected results for fine tuned bert model Beginners	0	270	February 9, 2024
Share your projects! Course	19	3841	February 18, 2025

Bug in models filtering by dataset?

Related topics