Inference speed

m-ali01 · September 17, 2023, 10:41am

Hello,

Does anyone have a benchmark of inference speed with llama-2 using huggingface spaces?

Another related question: are models hosted through the huggingface inference endpoint result in higher tokens/s than hf spaces using the same gpu?

Thanks

Topic		Replies	Views
Llama 3 70b in the Chat UI Is Super Slow and Nearly Unusable Beginners	2	707	October 4, 2024
RAG on HF Inference for Pros - using Llama 2 + Llama 2 embeddings model Models	0	1065	October 28, 2023
Accelerating inference for local HuggingFacePipeline of Llama3 🤗Transformers	0	88	August 1, 2024
Is there llama3 api for hugging face to use? Beginners	4	888	September 8, 2024
Different results with model hosted in HuggingFace and hosted in SageMaker Models	1	592	November 15, 2023