Centralized Benchmarks

Are there any plans to centralize benchmarks for datasets or models? There are some links to papers and code for datasets and also a sentence transformers benchmark

New models are added often which is great, but it’s hard to track how they perform since each paper often cherry picks comparative examples.

cc @lewtun this sounds related to auto-evaluation ?

Thanks for the ping @lhoestq !

Yes @denyslinkov we’re currently working on tooling that should make it much easier to run large-scale evaluations & model comparisons across the Hub. Stay tuned :slight_smile:

