While running the llama and falcon pipelines, I found that Llama-2 is over 30x slower than falcon for the same size (7b).
Is this something that is to be expected?
While running the llama and falcon pipelines, I found that Llama-2 is over 30x slower than falcon for the same size (7b).
Is this something that is to be expected?
Did you figure out why? Because I found that Llama-2-7b is much slower and requirs smaller batchsize than OPT-6.7b. They are almost the same size.
haven’t figured it out