JSON Schema Response Format NOT Working: invalid_request_error

michael-newsrx-com · April 16, 2025, 11:02am

I know I’m doing something wrong here, but can’t figure it out.

Problem

I’m trying to implement a JSON Schema constraint call with a Pydantic model, but I’m getting API errors, with no indication that I can discern as to what I’m doing wrong.

Error

Traceback (most recent call last):
  File "/home/michael/miniconda3/envs/patents_llm_update/lib/python3.13/site-packages/huggingface_hub/utils/_http.py", line 409, in hf_raise_for_status
    response.raise_for_status()
    ~~~~~~~~~~~~~~~~~~~~~~~~~^^
  File "/home/michael/miniconda3/envs/patents_llm_update/lib/python3.13/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 422 Client Error: Unprocessable Entity for url: https://router.huggingface.co/cerebras/v1/chat/completions

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/michael/git/patents_llm_update/llm_functions/models.py", line 56, in <module>
    test()
    ~~~~^^
  File "/home/michael/git/patents_llm_update/llm_functions/models.py", line 44, in test
    response = test_ep.chat.completions.create(messages=[user],  #
                                               max_tokens=4096,  #
                                               response_format=response_format,  #
                                               stream=False)
  File "/home/michael/miniconda3/envs/patents_llm_update/lib/python3.13/site-packages/huggingface_hub/inference/_client.py", line 992, in chat_completion
    data = self._inner_post(request_parameters, stream=stream)
  File "/home/michael/miniconda3/envs/patents_llm_update/lib/python3.13/site-packages/huggingface_hub/inference/_client.py", line 357, in _inner_post
    hf_raise_for_status(response)
    ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
  File "/home/michael/miniconda3/envs/patents_llm_update/lib/python3.13/site-packages/huggingface_hub/utils/_http.py", line 482, in hf_raise_for_status
    raise _format(HfHubHTTPError, str(e), response) from e
huggingface_hub.errors.HfHubHTTPError: 422 Client Error: Unprocessable Entity for url: https://router.huggingface.co/cerebras/v1/chat/completions (Request ID: Root=1-67feb932-3458407e60b0d8c90c4c7c59;d65a18ba-52ad-489e-a90c-5b3e9792cebc)
{"message":"type: Input should be 'text'","type":"invalid_request_error","param":"validation_error","code":"wrong_api_format"}

Code

HF Inference Client

def llama_40_scout_instruct(wait: bool = False) -> InferenceClient:
    hf_login_check()
    model = "meta-llama/Llama-4-Scout-17B-16E-Instruct"
    token = hf_bearer_token()
    ep = InferenceClient(provider="cerebras", model=model, api_key=token, )
    return ep

Pydantic Model

class SPRCompressed(BaseModel):
    """
    Sparse Priming Representation (SPR): An SPR is a specific kind of use of language for advanced
    NLP, NLU, and NLG tasks.

    An SPR is input distilled down to a list of succinct statements, assertions, associations,
    concepts, analogies, and metaphors.

    An SPR captures as much, conceptually, as possible, but with as few words as possible. An SPR
    is written a way that makes sense to an LLM, as the future audience will be another language
    model, not a human. Use complete sentences that are grammatically correct. Do not use
    abbreviations, as they can be ambiguous.
    """

    spr: str = Field(description="Sparse Priming Representation")

Main code

test_application.md: any largish complex text
compress_prompt.md: Instructions on how to compress to SPR format

def test():
    print("Testing SPRCompressed")
    print(SPRCompressed.model_json_schema())
    print()
    test_app = pathlib.Path(__file__).parent.joinpath("test_application.md").read_text()
    compress_prompt = pathlib.Path(__file__).parent.joinpath(
            "../prompts/compress_prompt.md").read_text()
    prompt = compress_prompt.format(text=test_app)
    test_ep: InferenceClient = llama_40_scout_instruct()
    user = {"role": "user", "content": [{"type": "text", "text": prompt}]}

    response_format: ChatCompletionInputGrammarType = ChatCompletionInputGrammarType(  #
            "json",
            SPRCompressed.model_json_schema())

    print("=" * 40)
    response = test_ep.chat.completions.create(messages=[user],  #
                                               max_tokens=4096,  #
                                               response_format=response_format,  #
                                               stream=False)
    print(response.choices[0].message.content)

Environment

huggingface-cli env

Copy-and-paste the text below in your GitHub issue.

- huggingface_hub version: 0.30.2
- Platform: Linux-6.8.0-57-generic-x86_64-with-glibc2.39
- Python version: 3.13.3
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Running in Google Colab Enterprise ?: No
- Token path ?: /home/michael/.cache/huggingface/token
- Has saved token ?: True
- Who am I ?: michael-newsrx-com
- Configured git credential helpers: store
- FastAI: N/A
- Tensorflow: N/A
- Torch: N/A
- Jinja2: 3.1.6
- Graphviz: N/A
- keras: N/A
- Pydot: N/A
- Pillow: N/A
- hf_transfer: N/A
- gradio: N/A
- tensorboard: N/A
- numpy: N/A
- pydantic: 2.11.3
- aiohttp: N/A
- hf_xet: N/A
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: /home/michael/.cache/huggingface/hub
- HF_ASSETS_CACHE: /home/michael/.cache/huggingface/assets
- HF_TOKEN_PATH: /home/michael/.cache/huggingface/token
- HF_STORED_TOKENS_PATH: /home/michael/.cache/huggingface/stored_tokens
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10

John6666 · April 16, 2025, 11:48am

The Function Calling in TGI or Transformers has been buggy for a while now. If it hasn’t been fixed yet, it might be because of this…

michael-newsrx-com · April 16, 2025, 1:18pm

I’m passing in a JSON schema, and even if it was a tool call to a routine, it should still return with the request to run the tool or JSON or neither. Not give exceptions about badly formatted requests.

The symptoms for the infinite tool recursion issue and this issue are not the same.

Topic		Replies	Views
PydanticUserError: The `__modify_schema__` method is not supported in Pydantic v2. Use `__get_pydantic_json_schema__` instead in class `SecretStr` Intermediate	1	440	January 22, 2025
ERROR: openai.UnprocessableEntityError: Error code: 422 Beginners	3	3360	May 10, 2024
HTTPError while calling API Beginners	0	502	December 12, 2023
Request failed: 500 Beginners	7	487	March 4, 2025
Python HF Not Working Beginners	1	30	July 1, 2025