Hello,
Does anyone have a benchmark of inference speed with llama-2 using huggingface spaces?
Another related question: are models hosted through the huggingface inference endpoint result in higher tokens/s than hf spaces using the same gpu?
Thanks