Causal LLM benchmarks

Currently I’m trying to get the LM evaluation harness running without success. I was curious if there is an easy way to Benchmark or evaluate pre-trained Generative text models inside the hugging face Library. I’m sorry if this is really obvious.