I don’t have any gpu on my PC.
So I want to the api to call. Just like openai、cohere、…
I want to the llama3 api
not sure but you can check this. It provides free API inference: https://console.groq.com/playground
2 Likes
If you’re looking for free, this space might be useful.
On a $10/month subs, you could also use the 70B Llama3 model for example. (I use it)
Like NatureSon said, I use the groq api playground
Hi,
HF provides the serverless Inference API to do just that. It comes with OpenAI-compatible APIs.
Usage is as follows (add your HF token):
# instead of `from openai import OpenAI`
from huggingface_hub import InferenceClient
# instead of `client = OpenAI(...)`
client = InferenceClient(
"meta-llama/Meta-Llama-3.1-8B-Instruct",
token=<your-hf-token>,
)
output = client.chat.completions.create(
model="meta-llama/Meta-Llama-3-8B-Instruct",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Count to 10"},
],
stream=True,
max_tokens=1024,
)
for chunk in output:
print(chunk.choices[0].delta.content)
By getting a PRO subscription, you get higher rate limits.