Inference speed

Hello,

Does anyone have a benchmark of inference speed with llama-2 using huggingface spaces?

Another related question: are models hosted through the huggingface inference endpoint result in higher tokens/s than hf spaces using the same gpu?

Thanks