Langchain huggingface endpoints error

I’m calling the model via Inference endpoints and I am getting same error and unable to invoke llm properly.

from langchain_huggingface import HuggingFaceEndpoint

repo_id = “mistralai/Mistral-7B-Instruct-v0.2”

llm = HuggingFaceEndpoint(repo_id=repo_id, max_new_tokens=128, temperature=0.7, huggingfacehub_api_token=os.getenv(“HF_TOKEN”), provider=“auto”)
llm

llm.invoke(“what is machine Learning?”)

ValueError: Model mistralai/Mistral-7B-Instruct-v0.2 is not supported for task text-generation and provider featherless-ai. Supported task: conversational.

got the same error for gemma and qwen model.
I even tried with task=”” parameter and also with prompt with llm (chain), it still didn’t worked!

1 Like

Using APIs from LangChain can be confusing.


Cause: your code is asking Hugging Face for the text-generation task, but the provider (featherless-ai) only exposes that model as conversational (chat), and HuggingFaceEndpoint currently always calls the text-generation route under the hood. That mismatch triggers the ValueError.

I will break it into:

  1. What is actually happening
  2. Why task="" or task="conversational" does nothing
  3. Concrete fixes (with code)
  4. Short summary at the end

1. What is actually happening

Your code:

from langchain_huggingface import HuggingFaceEndpoint

repo_id = "mistralai/Mistral-7B-Instruct-v0.2"

llm = HuggingFaceEndpoint(
    repo_id=repo_id,
    max_new_tokens=128,
    temperature=0.7,
    huggingfacehub_api_token=os.getenv("HF_TOKEN"),
    provider="auto",
)

llm.invoke("what is machine Learning?")

Despite the wording “Inference endpoints”, this call is not using a dedicated Inference Endpoint URL. It is using the Hugging Face router + Inference Providers:

  • repo_id="mistralai/Mistral-7B-Instruct-v0.2" + token ⇒ call router.huggingface.co with a Hub model id.
  • provider="auto" ⇒ the router chooses a provider (for you it picked featherless-ai). (Hugging Face)
  • HuggingFaceEndpoint internally wraps huggingface_hub.InferenceClient and, for LLM usage, calls text_generation, which implies the "text-generation" task. (Hugging Face)

On the provider side:

  • Each provider has a task mapping for every model: e.g. { "conversational": {...}, "text-generation": {...} }.
  • For some chat-type models (Mistral, Gemma, Qwen, Llama 3 Instruct, etc.), some providers expose only the conversational/chat task.
  • When a client asks for a task that is not in that mapping, the provider helper raises exactly the error you see. (GitHub)

You can see the same error text in public threads with the same model and provider:

  • Reddit: user calling mistralai/Mistral-7B-Instruct-v0.2 via HF gets
    Model mistralai/Mistral-7B-Instruct-v0.2 is not supported for task text-generation and provider featherless-ai. Supported task: conversational. (reddit.com)

LangChain’s own issue tracker shows the same story for Mistral v0.3 with another provider (together). (GitHub)

So:

  • Client side: LangChain → InferenceClient.text_generation() → task=text-generation.
  • Server side: provider mapping for that model says: {"conversational": ...} only.
  • Result: ValueError: supported task: conversational.

The same logic explains why Gemma and Qwen models fail in the same way for you: those provider–model pairs are also configured as chat-only. This is reported for Qwen and Gemma in HF Hub issues and RAG papers as well. (GitHub)


2. Why task="" or task="conversational" did not help

You tried:

  • Setting task=""
  • Setting task="conversational"
  • Using chains on top of the LLM

None of that changed the error. Reason:

  • The actual task is determined by which method the HF client uses, not by a free-form task string you pass through LangChain.

    • InferenceClient.text_generation(...) ⇒ task "text-generation".
    • InferenceClient.chat_completion(...) ⇒ task "chat-completion" (mapped to conversational under the hood). (Hugging Face)
  • HuggingFaceEndpoint currently uses text_generation for its LLM calls, regardless of your task parameter. This is exactly what shows up in LangChain bug reports: stack traces always go into InferenceClient.text_generation. (GitHub)

  • Wrapping the LLM into a chain (LLMChain, RAG chain, etc.) does not change the underlying method; it just changes how prompts are built.

So modifying task in the HuggingFaceEndpoint constructor does not switch the client from “text-generation” to “conversational/chat”. You are still hitting the text-generation route, so the mismatch stays.


3. Solutions

3.1 Use the chat abstraction for these models

For models that providers expose only as conversational/chat, treat them as chat models, not plain text generators.

Use ChatHuggingFace on top of HuggingFaceEndpoint and send a list of messages:

from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace
from langchain_core.messages import HumanMessage

repo_id = "mistralai/Mistral-7B-Instruct-v0.2"

base_llm = HuggingFaceEndpoint(
    repo_id=repo_id,
    max_new_tokens=128,
    temperature=0.7,
    huggingfacehub_api_token=os.getenv("HF_TOKEN"),
    provider="auto",  # router chooses provider (featherless-ai in your case)
)

chat_llm = ChatHuggingFace(llm=base_llm)

resp = chat_llm.invoke([HumanMessage(content="What is machine learning?")])
print(resp.content)

Why this works:

  • ChatHuggingFace is designed to wrap an underlying HF LLM and convert a message list into the right prompt format using the model’s chat_template. ((note Subtitles))
  • Under the hood, for conversational/task mappings, it uses the chat-style API (or compatible handling), which aligns with the provider’s "conversational" support.
  • You stay inside LangChain, so you can still use chains, RAG, agents, etc.

For chains, you then use chat_llm instead of llm:

from langchain.chains import LLMChain
from langchain_core.prompts import PromptTemplate

prompt = PromptTemplate.from_template("Explain {topic} in simple terms.")
chain = LLMChain(llm=chat_llm, prompt=prompt)

print(chain.invoke({"topic": "machine learning"}))

3.2 Use a real Inference Endpoint URL instead of repo_id

If you really do want “Inference Endpoints” as in dedicated endpoint you deployed in the HF UI, then do this:

  1. In the HF web UI, create an Inference Endpoint for your model (e.g. Mistral-7B).
  2. Copy the endpoint URL, something like:
    https://abc1234.us-east-1.aws.endpoints.huggingface.cloud (Hugging Face)
  3. Use endpoint_url instead of repo_id:
from langchain_huggingface import HuggingFaceEndpoint

llm = HuggingFaceEndpoint(
    endpoint_url="https://<your-endpoint>.endpoints.huggingface.cloud",
    max_new_tokens=128,
    temperature=0.7,
    huggingfacehub_api_token=os.getenv("HF_TOKEN"),
    # do not set provider here; you are hitting your own endpoint
)

print(llm.invoke("What is machine learning?"))

Why this often fixes the issue:

  • A dedicated endpoint typically runs a TGI-like text-generation server or similar.
  • You are no longer subject to the “provider X only supports conversational” logic of the router.
  • The endpoint just gets a generation payload and returns text.

This is closer to the older “Inference Endpoints + LangChain” examples that you see in blog posts. (Medium)

3.3 Pick a model–provider combo that supports text-generation

If you insist on using llm.invoke("...") as plain text-generation via the router (no chat wrapper), then:

  1. Check the model’s Inference Providers section or call the inferenceProviderMapping API for that model. That mapping tells you which tasks each provider supports. (GitHub)
  2. Find a provider that lists "text-generation" (or chat-completion in a way compatible with text generation) for that model.
  3. Pin that provider explicitly:
llm = HuggingFaceEndpoint(
    repo_id="some-model-that-has-text-generation-on-provider-X",
    provider="hf-inference",  # or together, or another provider that supports text-generation for that model
    max_new_tokens=128,
    temperature=0.7,
    huggingfacehub_api_token=os.getenv("HF_TOKEN"),
)

print(llm.invoke("What is machine learning?"))

Caveats:

  • Not every provider supports text-generation for every model. Many modern “instruct” models are chat-only for some providers. HF issues for Qwen/Gemma/Mistral show exactly that: “Supported task: conversational” only. (GitHub)
  • If you switch to a provider or model that only supports conversational, you will hit the same error again.

3.4 Bypass LangChain, confirm behavior with InferenceClient, then wrap

For debugging or finer control, you can talk to the HF API directly and then wrap it in LangChain.

For chat:

from huggingface_hub import InferenceClient
import os

client = InferenceClient(
    model="mistralai/Mistral-7B-Instruct-v0.2",
    token=os.getenv("HF_TOKEN"),
    # provider="auto" or a specific provider if you want
)

messages = [{"role": "user", "content": "What is machine learning?"}]
out = client.chat_completion(messages, max_tokens=128)
print(out.choices[0].message["content"])

For text-generation (will fail in your current provider config):

text = client.text_generation("What is machine learning?", max_new_tokens=128)
print(text)

This reproduces the provider error directly and confirms it is not a LangChain-only issue. HF docs describe these two methods and the implied tasks. (Hugging Face)

Some projects implement a fallback like:

  • Try text_generation.
  • If the exception message contains "supported task: conversational", retry using chat_completion. (Hugging Face)

4. Short bullet summary

  • The error means: “Provider only exposes this model as conversational/chat, but your code is asking for the text-generation task.”

  • In LangChain, HuggingFaceEndpoint uses InferenceClient.text_generation, which implies task="text-generation", regardless of task="" or task="conversational" in your code.

  • For mistralai/Mistral-7B-Instruct-v0.2 with featherless-ai, the provider mapping is chat-only, so you get the Supported task: conversational error; Gemma and Qwen can have the same issue with their providers. (reddit.com)

  • Fix options:

    • Use ChatHuggingFace + message lists (treat it as a chat model).
    • Or use a real Inference Endpoint URL (endpoint_url) where you control the handler.
    • Or choose a model–provider pair that actually supports text-generation.
    • Or call InferenceClient.chat_completion directly and then wrap in a custom LangChain class.

You are not misusing LangChain’s API; you are hitting a provider task mismatch. The safest approach for these models via providers is to use chat semantics (ChatHuggingFace or chat_completion).