Currently I’m trying to get the LM evaluation harness running without success. I was curious if there is an easy way to Benchmark or evaluate pre-trained Generative text models inside the hugging face Library. I’m sorry if this is really obvious.
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Is it possible to evaluate generations/output while fine-tuning a LLM? | 2 | 755 | November 1, 2023 | |
| EleutherAI / lm-evaluation-harness on a custom model | 0 | 2108 | April 10, 2024 | |
| How to run Llama 3.1 benchmark | 0 | 72 | September 2, 2024 | |
| How does the Trainer work for Text Generation? | 0 | 1028 | August 11, 2021 | |
| New tool to improve performance of generative AI models | 0 | 770 | April 2, 2023 |