Hi everybody,
I’m trying to assess performance on pruned version of different Llama models with EleutherAI/lm-evaluation-harness. (At the moment I’m just deleting layers, inspired by short-transformers/short_transformers/short_transformer.py at main · melisa-writer/short-transformers · GitHub )
But when I do load a pruned model I get "Passed an already-initialized model through
pretrained, assuming single-process call to evaluate() or custom distributed integration"
and the benchmarks takes hours instead of minutes, while the model is, for example, a simple llama with one layer less.
Do you have any suggestion on how to fix this issue?