To use Llama3.1-405b do I have to rent a server, or can I send my API requests to someone else's server and pay them thrrough HF?

I’m a complete beginner (just signed to HF and trying to learn how it can be used).

I’d like to use the Llama3.1-405b model via API requests, and wonder if HF allows people who already own servers to let other users access their servers through API requests. Is that possible?

If yes, where on HF’s site can I find such servers?

It seems HF doesn’t allow the 405b model to be used by inference API serverless, something about it taking 900GB whereas the limit for serverless is 10GB.

Thank you.

You can set up an inference endpoint (Inference Endpoints - Hugging Face). You’ll need to request to increase your quota though so you can access enough GPUs to actually run it. If the 405b-instruct is sufficient and you don’t need the base model, you can also set up an API endpoint with AWS Bedrock to access 405b-instruct (Build Generative AI Applications with Foundation Models - Amazon Bedrock - AWS).