Hi All!
I’d like API access to some of the new SOTA models, like Vicuna 13b.
I found jeffwan/vicuna-13b and I see:
Use this model with the Inference API
I copy over the code:
import requests
API_URL = "https://api-inference.huggingface.co/models/jeffwan/vicuna-13b"
headers = {"Authorization": "Bearer_xxx"}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
output = query({"inputs": "Can you please let us know more details about your ",})
I test with my API key, but see:
{'error': 'The model jeffwan/vicuna-13b is too large to be loaded automatically (26GB > 10GB). For commercial use please use PRO spaces (https://huggingface.co/spaces) or Inference Endpoints (https://huggingface.co/inference-endpoints).'}
I looked at deploying an inference endpoint (e.g., on T4 or A100).
It it is far too expensive for an individual developer: ~$5k / month.
Does HuggingFace host models such as vicuna-13b
?
I was confused by the language Use this model with the Inference API
but then The model jeffwan/vicuna-13b is too large to be loaded automatically
, indicating that it does not work with the inference API.
Thank!