To use Llama3.1-405b do I have to rent a server, or can I send my API requests to someone else's server and pay them thrrough HF?

karjala · August 29, 2024, 9:17pm

I’m a complete beginner (just signed to HF and trying to learn how it can be used).

I’d like to use the Llama3.1-405b model via API requests, and wonder if HF allows people who already own servers to let other users access their servers through API requests. Is that possible?

If yes, where on HF’s site can I find such servers?

It seems HF doesn’t allow the 405b model to be used by inference API serverless, something about it taking 900GB whereas the limit for serverless is 10GB.

Thank you.

ProfCastillo · August 30, 2024, 8:36pm

You can set up an inference endpoint (Inference Endpoints - Hugging Face). You’ll need to request to increase your quota though so you can access enough GPUs to actually run it. If the 405b-instruct is sufficient and you don’t need the base model, you can also set up an API endpoint with AWS Bedrock to access 405b-instruct (Build Generative AI Applications with Foundation Models - Amazon Bedrock - AWS).

Topic		Replies	Views
Help using inference endpoint with Llama 3.1 405B Instruct Inference Endpoints on the Hub	1	166	August 30, 2024
How to use llm model's api? Beginners	2	2994	November 14, 2024
Serverless Llama 3.1 70b it, quantized or not? Inference Endpoints on the Hub	0	61	November 13, 2024
Is there llama3 api for hugging face to use? Beginners	4	895	September 8, 2024
Unable to access Llama3.1 model despite having access granted Models	1	460	September 9, 2024

To use Llama3.1-405b do I have to rent a server, or can I send my API requests to someone else's server and pay them thrrough HF?

Related topics