Llama-2 significantly slower than other models on huggingface

While running the llama and falcon pipelines, I found that Llama-2 is over 30x slower than falcon for the same size (7b).

Is this something that is to be expected?

Did you figure out why? Because I found that Llama-2-7b is much slower and requirs smaller batchsize than OPT-6.7b. They are almost the same size.