Hi guys,
I mainly use HuggingFace to buid AI Agent with Streamlit and LangChain and deploy them on Space.
Today, I would like to use Space to deploy an open model (Mixtral-8x7B-Instruct-v0.1) for my client.
The goal is to make it accessible with an API (FastAPI) like other providers (OpenAI, Mistral, …)
No problem for that, but I would like to add some harware with GPU by upgrading Space.
How can I choose the right setting to don’t have an expensive invoice ?
Is that better to use an allocated hardware (more expensive, but more secure, no latency) or by using ZeroGPU subscription (latency) ?
Thanks