sentence-transformers/all-MiniLM-L6-v2 Not working all of a sudden

POST requests to https://router.huggingface.co/hf-inference/models/sentence-transformers/all-MiniLM-L6-v2 return error code 422 (Unprocessable Content) all of sudden

2 Likes

Yeah I’m having the same issue using the API to send requests in Unity. Even using the inference provider from the model page directly results in the same error sentence-transformers/all-MiniLM-L6-v2 · Hugging Face. It also seems to persist with a lot of the other sentence transformers.

1 Like

I’m getting 404 errors using InferenceClient() on meta-llama/Llama-3.3-70B-Instruct, meta-llama/Llama-3.1-8B-Instruct, Mixtral-8x7B-Instruct-v0.1, and mistralai/Mistral-7B-Instruct-v0.3. Basically any InferenceClient() calls! Am I not alone in this?

1 Like

Same here… @michellehbn

For those stuck with HF, get a free Mistral account to get the API key. Then you can use this class to do text generation. The chat_stream() will simulate the packets HF’s InferenceClient() would return, so it can be a plug and play…


import os
import config
from mistralai import Mistral

class TextPacket:
    def __init__(self):
        self.choices = []

class TextMessage:
    def __init__(self):
        self.role:str = None
        self.content:str = None

class TextGroup:
    def __init__(self):
        self.index = 0
        self.finish_reason:str = None
        self.delta:TextMessage = TextMessage()
        self.message:TextMessage = TextMessage()

class MistralGenerator():
    def __init__(self):
        self.api_key = config.MISTRALAI_APIKEY
        self.model = "mistral-small-latest"
        self.client = Mistral(api_key=self.api_key)

    def chat_complete(self, query, max_tokens=512, temperature=0.7, top_p=0.9):
        chat_response = self.client.chat.complete(
            model= self.model,
            messages = [
                {
                    "role": "user",
                    "content": query,
                },
            ],
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
        )
        print(chat_response.choices[0].message.content)
        return chat_response.choices[0].message.content

    def chat_stream(self, messages, max_tokens=512, temperature=0.7, top_p=0.9):
        stream_response = self.client.chat.stream(
            model=self.model,
            messages=messages,
            max_tokens=max_tokens,
            temperature=temperature,
            top_p=top_p,
            stream=True
        )

        for chunk in stream_response:
            #print(chunk.data.choices[0].delta.content)
            message = TextPacket()
            group = TextGroup()
            group.index = 0
            group.delta.role = "assistant"
            group.delta.content = chunk.data.choices[0].delta.content
            message.choices.append(group)
            yield message

        # Final stop message for stream
        message = TextPacket()
        group = TextGroup()
        group.index = 0
        group.delta.role = "assistant"
        group.delta.content = ""
        group.delta.finish_reason = "stop"
        message.choices.append(group)
        yield message

1 Like

Perhaps resolved?

Maybe fixed. From HF Discord:

Tom Aarsen
I’ve asked internally, and they indeed reported an issue, but it has been resolved now! Apologies

Yes, the issue with the sentence transformers has been fixed now.

1 Like