POST requests to https://router.huggingface.co/hf-inference/models/sentence-transformers/all-MiniLM-L6-v2 return error code 422 (Unprocessable Content) all of sudden
Yeah I’m having the same issue using the API to send requests in Unity. Even using the inference provider from the model page directly results in the same error sentence-transformers/all-MiniLM-L6-v2 · Hugging Face. It also seems to persist with a lot of the other sentence transformers.
I’m getting 404 errors using InferenceClient() on meta-llama/Llama-3.3-70B-Instruct, meta-llama/Llama-3.1-8B-Instruct, Mixtral-8x7B-Instruct-v0.1, and mistralai/Mistral-7B-Instruct-v0.3. Basically any InferenceClient() calls! Am I not alone in this?
Same here… @michellehbn
For those stuck with HF, get a free Mistral account to get the API key. Then you can use this class to do text generation. The chat_stream() will simulate the packets HF’s InferenceClient() would return, so it can be a plug and play…
import os
import config
from mistralai import Mistral
class TextPacket:
def __init__(self):
self.choices = []
class TextMessage:
def __init__(self):
self.role:str = None
self.content:str = None
class TextGroup:
def __init__(self):
self.index = 0
self.finish_reason:str = None
self.delta:TextMessage = TextMessage()
self.message:TextMessage = TextMessage()
class MistralGenerator():
def __init__(self):
self.api_key = config.MISTRALAI_APIKEY
self.model = "mistral-small-latest"
self.client = Mistral(api_key=self.api_key)
def chat_complete(self, query, max_tokens=512, temperature=0.7, top_p=0.9):
chat_response = self.client.chat.complete(
model= self.model,
messages = [
{
"role": "user",
"content": query,
},
],
max_tokens=max_tokens,
temperature=temperature,
top_p=top_p,
)
print(chat_response.choices[0].message.content)
return chat_response.choices[0].message.content
def chat_stream(self, messages, max_tokens=512, temperature=0.7, top_p=0.9):
stream_response = self.client.chat.stream(
model=self.model,
messages=messages,
max_tokens=max_tokens,
temperature=temperature,
top_p=top_p,
stream=True
)
for chunk in stream_response:
#print(chunk.data.choices[0].delta.content)
message = TextPacket()
group = TextGroup()
group.index = 0
group.delta.role = "assistant"
group.delta.content = chunk.data.choices[0].delta.content
message.choices.append(group)
yield message
# Final stop message for stream
message = TextPacket()
group = TextGroup()
group.index = 0
group.delta.role = "assistant"
group.delta.content = ""
group.delta.finish_reason = "stop"
message.choices.append(group)
yield message
Perhaps resolved?
Maybe fixed. From HF Discord:
Tom Aarsen
I’ve asked internally, and they indeed reported an issue, but it has been resolved now! Apologies
Yes, the issue with the sentence transformers has been fixed now.