Pruned Llama on lm-evaluation-harness

YuriGardinazzi · July 29, 2024, 2:40pm

Hi everybody,

I’m trying to assess performance on pruned version of different Llama models with EleutherAI/lm-evaluation-harness. (At the moment I’m just deleting layers, inspired by short-transformers/short_transformers/short_transformer.py at main · melisa-writer/short-transformers · GitHub )

But when I do load a pruned model I get "Passed an already-initialized model through pretrained, assuming single-process call to evaluate() or custom distributed integration" and the benchmarks takes hours instead of minutes, while the model is, for example, a simple llama with one layer less.

Do you have any suggestion on how to fix this issue?

Topic		Replies	Views
Different results from checkpoint evaluation when loading fine-tuned LLM model Intermediate	5	3234	September 22, 2023
How to run Llama 3.1 benchmark Models	0	63	September 2, 2024
EleutherAI / lm-evaluation-harness on a custom model Models	0	1933	April 10, 2024
Why the model loading of llama2 is so slow? 🤗Transformers	6	9458	April 24, 2024
I'm a little scared, I'm new Beginners	0	80	July 31, 2024

Pruned Llama on lm-evaluation-harness

Related topics