When I use the same prompts, I find that huggingface generater:AutoModelForCausalLM.from_pretrained.generate() is much slower than the original llama2 generater:LLaMA.generate() about six times. I wonder why is it, and how to speed up?
1 Like