How to use llm model's api?

riddhi810 · November 6, 2024, 8:41am

Hello,

I’m working with a RAG system that uses a Hugging Face model. However, the Llama models are too large to load locally. Is there a way to use the model’s API instead of loading it directly, and if so, how?

John6666 · November 6, 2024, 8:48am

There are two main ways to do this. The Serverless Inference API is free to use, but it’s difficult to use it reliably. The Endpoint API is stable, but it’s not free.
There are also other services that use HF models with other companies’ APIs, but I don’t know much about them.
There is a Playground where you can actually use the Inference API, so you can try it out there.

Applegoat782 · November 14, 2024, 6:42pm

[deleted for posting spam content]

Topic		Replies	Views
Is there llama3 api for hugging face to use? Beginners	4	967	September 8, 2024
Inference Model with API and Integrate to LM (Language Model) 🤗Transformers	0	651	June 7, 2022
Inference service for large models, such as Vicuna 13b Beginners	0	1431	May 5, 2023
To use Llama3.1-405b do I have to rent a server, or can I send my API requests to someone else's server and pay them thrrough HF? Beginners	1	106	August 30, 2024
Inference API works for flan-t5-xxl, but not for many other models I have tried with Jupyter/VSCode 🤗Transformers	0	382	June 15, 2023

How to use llm model's api?

Related topics