I just wanted to find out for sure that I am not charged anything else than the montly subscription for using it. This is not a dedicated inference api endpoint, but the normally available one.
There is no indication in the billing section about this usage. I just don’t want surprises.
I’m also interested in this, as I heavily rely on the Inference API (making 1 request per 10 seconds for 24 hours). I searched the documentation but couldn’t find relevant information.
For reference, here’s the code I use to send requests:
client = AsyncInferenceClient("meta-llama/Meta-Llama-3-8B-Instruct")
chat = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Hi, can I reach the moon by jumping?"},
]
response = await client.chat_completion(chat, max_tokens=100, temperature=0.1)