Pruned Llama on lm-evaluation-harness

Hi everybody,

I’m trying to assess performance on pruned version of different Llama models with EleutherAI/lm-evaluation-harness. (At the moment I’m just deleting layers, inspired by short-transformers/short_transformers/short_transformer.py at main · melisa-writer/short-transformers · GitHub )

But when I do load a pruned model I get "Passed an already-initialized model through pretrained, assuming single-process call to evaluate() or custom distributed integration" and the benchmarks takes hours instead of minutes, while the model is, for example, a simple llama with one layer less.

Do you have any suggestion on how to fix this issue?