I know I’m doing something wrong here, but can’t figure it out.
Problem
I’m trying to implement a JSON Schema constraint call with a Pydantic model, but I’m getting API errors, with no indication that I can discern as to what I’m doing wrong.
Error
Traceback (most recent call last):
File "/home/michael/miniconda3/envs/patents_llm_update/lib/python3.13/site-packages/huggingface_hub/utils/_http.py", line 409, in hf_raise_for_status
response.raise_for_status()
~~~~~~~~~~~~~~~~~~~~~~~~~^^
File "/home/michael/miniconda3/envs/patents_llm_update/lib/python3.13/site-packages/requests/models.py", line 1024, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 422 Client Error: Unprocessable Entity for url: https://router.huggingface.co/cerebras/v1/chat/completions
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/michael/git/patents_llm_update/llm_functions/models.py", line 56, in <module>
test()
~~~~^^
File "/home/michael/git/patents_llm_update/llm_functions/models.py", line 44, in test
response = test_ep.chat.completions.create(messages=[user], #
max_tokens=4096, #
response_format=response_format, #
stream=False)
File "/home/michael/miniconda3/envs/patents_llm_update/lib/python3.13/site-packages/huggingface_hub/inference/_client.py", line 992, in chat_completion
data = self._inner_post(request_parameters, stream=stream)
File "/home/michael/miniconda3/envs/patents_llm_update/lib/python3.13/site-packages/huggingface_hub/inference/_client.py", line 357, in _inner_post
hf_raise_for_status(response)
~~~~~~~~~~~~~~~~~~~^^^^^^^^^^
File "/home/michael/miniconda3/envs/patents_llm_update/lib/python3.13/site-packages/huggingface_hub/utils/_http.py", line 482, in hf_raise_for_status
raise _format(HfHubHTTPError, str(e), response) from e
huggingface_hub.errors.HfHubHTTPError: 422 Client Error: Unprocessable Entity for url: https://router.huggingface.co/cerebras/v1/chat/completions (Request ID: Root=1-67feb932-3458407e60b0d8c90c4c7c59;d65a18ba-52ad-489e-a90c-5b3e9792cebc)
{"message":"type: Input should be 'text'","type":"invalid_request_error","param":"validation_error","code":"wrong_api_format"}
Code
HF Inference Client
def llama_40_scout_instruct(wait: bool = False) -> InferenceClient:
hf_login_check()
model = "meta-llama/Llama-4-Scout-17B-16E-Instruct"
token = hf_bearer_token()
ep = InferenceClient(provider="cerebras", model=model, api_key=token, )
return ep
Pydantic Model
class SPRCompressed(BaseModel):
"""
Sparse Priming Representation (SPR): An SPR is a specific kind of use of language for advanced
NLP, NLU, and NLG tasks.
An SPR is input distilled down to a list of succinct statements, assertions, associations,
concepts, analogies, and metaphors.
An SPR captures as much, conceptually, as possible, but with as few words as possible. An SPR
is written a way that makes sense to an LLM, as the future audience will be another language
model, not a human. Use complete sentences that are grammatically correct. Do not use
abbreviations, as they can be ambiguous.
"""
spr: str = Field(description="Sparse Priming Representation")
Main code
test_application.md: any largish complex text
compress_prompt.md: Instructions on how to compress to SPR format
def test():
print("Testing SPRCompressed")
print(SPRCompressed.model_json_schema())
print()
test_app = pathlib.Path(__file__).parent.joinpath("test_application.md").read_text()
compress_prompt = pathlib.Path(__file__).parent.joinpath(
"../prompts/compress_prompt.md").read_text()
prompt = compress_prompt.format(text=test_app)
test_ep: InferenceClient = llama_40_scout_instruct()
user = {"role": "user", "content": [{"type": "text", "text": prompt}]}
response_format: ChatCompletionInputGrammarType = ChatCompletionInputGrammarType( #
"json",
SPRCompressed.model_json_schema())
print("=" * 40)
response = test_ep.chat.completions.create(messages=[user], #
max_tokens=4096, #
response_format=response_format, #
stream=False)
print(response.choices[0].message.content)
Environment
huggingface-cli env
Copy-and-paste the text below in your GitHub issue.
- huggingface_hub version: 0.30.2
- Platform: Linux-6.8.0-57-generic-x86_64-with-glibc2.39
- Python version: 3.13.3
- Running in iPython ?: No
- Running in notebook ?: No
- Running in Google Colab ?: No
- Running in Google Colab Enterprise ?: No
- Token path ?: /home/michael/.cache/huggingface/token
- Has saved token ?: True
- Who am I ?: michael-newsrx-com
- Configured git credential helpers: store
- FastAI: N/A
- Tensorflow: N/A
- Torch: N/A
- Jinja2: 3.1.6
- Graphviz: N/A
- keras: N/A
- Pydot: N/A
- Pillow: N/A
- hf_transfer: N/A
- gradio: N/A
- tensorboard: N/A
- numpy: N/A
- pydantic: 2.11.3
- aiohttp: N/A
- hf_xet: N/A
- ENDPOINT: https://huggingface.co
- HF_HUB_CACHE: /home/michael/.cache/huggingface/hub
- HF_ASSETS_CACHE: /home/michael/.cache/huggingface/assets
- HF_TOKEN_PATH: /home/michael/.cache/huggingface/token
- HF_STORED_TOKENS_PATH: /home/michael/.cache/huggingface/stored_tokens
- HF_HUB_OFFLINE: False
- HF_HUB_DISABLE_TELEMETRY: False
- HF_HUB_DISABLE_PROGRESS_BARS: None
- HF_HUB_DISABLE_SYMLINKS_WARNING: False
- HF_HUB_DISABLE_EXPERIMENTAL_WARNING: False
- HF_HUB_DISABLE_IMPLICIT_TOKEN: False
- HF_HUB_ENABLE_HF_TRANSFER: False
- HF_HUB_ETAG_TIMEOUT: 10
- HF_HUB_DOWNLOAD_TIMEOUT: 10