Problem arise after one week: Bad request: "Task not found for this model"

Hello, sorry for the unusual request. I’m working on an assignment where we need to fine-tune a model using Unsloath and then publish a UI on Hugging Face. I created my model called "davnas/Italian_Cuisine_1.2" using the Unsloath Colab, and I successfully uploaded it to Hugging Face.

Last week, everything was working fine. However, yesterday, after switching from my friend’s model back to mine, I started encountering the following error:

Bad request: Task not found for this model

Even after attempting to revert to the original setup, the problem persists.

Could someone please help me troubleshoot this issue? Apologies for any mistakes in the description—I’m still learning and just getting started. Thank you!

I usually perform inference with

from transformers import AutoModel, AutoTokenizer

max_seq_length = 2048  # Choose any! We auto support ROPE scaling internally!
dtype = None  # None for auto detection. Float16 for Tesla T4, V100, bFloat16 for Ampere+

model_name_or_path =  "davnas/Italian_Cousine_1.2"

from unsloth import FastLanguageModel
import torch

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name=model_name_or_path,
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=True,
    # token = "hf_...", #se il nostro modello non è public
    # Use one if using gated models like meta-llama/Llama-2-7b-hf
)

and


from unsloth import FastLanguageModel

FastLanguageModel.for_inference(model)  # Enable native 2x faster inference

messages = [
    {"role": "user", "content": "How can I cook an smoothie?"},
]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,  # Must add for generation
    return_tensors="pt",
).to("cuda")

from transformers import TextStreamer
text_streamer = TextStreamer(tokenizer, skip_prompt=True)

model.generate(input_ids=inputs, streamer=text_streamer, max_new_tokens=128,
               use_cache=True, temperature=1.5, min_p=0.1)

the code used for training the model can be found here

lastly the full error is:

===== Application Startup at 2024-12-07 09:20:41 =====

/usr/local/lib/python3.10/site-packages/gradio/components/chatbot.py:228: UserWarning: The 'tuples' format for chatbot messages is deprecated and will be removed in a future version of Gradio. Please set type='messages' instead, which uses openai-style 'role' and 'content' keys.
  warnings.warn(
* Running on local URL:  http://0.0.0.0:7860, with SSR ⚡

To create a public link, set `share=True` in `launch()`.
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 406, in hf_raise_for_status
    response.raise_for_status()
  File "/usr/local/lib/python3.10/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://api-inference.huggingface.co/models/davnas/Italian_Cousine_1.2/v1/chat/completions

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 622, in process_events
    response = await route_utils.call_process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 2016, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1581, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 691, in async_iteration
    return await anext(iterator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 796, in asyncgen_wrapper
    response = await iterator.__anext__()
  File "/usr/local/lib/python3.10/site-packages/gradio/chat_interface.py", line 667, in _stream_fn
    first_response = await async_iteration(generator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 691, in async_iteration
    return await anext(iterator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 685, in __anext__
    return await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 943, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 668, in run_sync_iterator_async
    return next(iterator)
  File "/home/user/app/app.py", line 30, in respond
    for message in client.chat_completion(
  File "/usr/local/lib/python3.10/site-packages/huggingface_hub/inference/_client.py", line 842, in chat_completion
    data = self.post(model=model_url, json=payload, stream=stream)
  File "/usr/local/lib/python3.10/site-packages/huggingface_hub/inference/_client.py", line 305, in post
    hf_raise_for_status(response)
  File "/usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 460, in hf_raise_for_status
    raise _format(BadRequestError, message, response) from e
huggingface_hub.errors.BadRequestError: (Request ID: BwTVVHXLmrcuAlpoxR2bH)

Bad request:
Task not found for this model
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 406, in hf_raise_for_status
    response.raise_for_status()
  File "/usr/local/lib/python3.10/site-packages/requests/models.py", line 1024, in raise_for_status
    raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 400 Client Error: Bad Request for url: https://api-inference.huggingface.co/models/davnas/Italian_Cousine_1.2/v1/chat/completions

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/gradio/queueing.py", line 622, in process_events
    response = await route_utils.call_process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/route_utils.py", line 323, in call_process_api
    output = await app.get_blocks().process_api(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 2016, in process_api
    result = await self.call_function(
  File "/usr/local/lib/python3.10/site-packages/gradio/blocks.py", line 1581, in call_function
    prediction = await utils.async_iteration(iterator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 691, in async_iteration
    return await anext(iterator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 796, in asyncgen_wrapper
    response = await iterator.__anext__()
  File "/usr/local/lib/python3.10/site-packages/gradio/chat_interface.py", line 667, in _stream_fn
    first_response = await async_iteration(generator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 691, in async_iteration
    return await anext(iterator)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 685, in __anext__
    return await anyio.to_thread.run_sync(
  File "/usr/local/lib/python3.10/site-packages/anyio/to_thread.py", line 56, in run_sync
    return await get_async_backend().run_sync_in_worker_thread(
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 2441, in run_sync_in_worker_thread
    return await future
  File "/usr/local/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 943, in run
    result = context.run(func, *args)
  File "/usr/local/lib/python3.10/site-packages/gradio/utils.py", line 668, in run_sync_iterator_async
    return next(iterator)
  File "/home/user/app/app.py", line 30, in respond
    for message in client.chat_completion(
  File "/usr/local/lib/python3.10/site-packages/huggingface_hub/inference/_client.py", line 842, in chat_completion
    data = self.post(model=model_url, json=payload, stream=stream)
  File "/usr/local/lib/python3.10/site-packages/huggingface_hub/inference/_client.py", line 305, in post
    hf_raise_for_status(response)
  File "/usr/local/lib/python3.10/site-packages/huggingface_hub/utils/_http.py", line 460, in hf_raise_for_status
    raise _format(BadRequestError, message, response) from e
huggingface_hub.errors.BadRequestError: (Request ID: GueVsTq3CZ9CNO2c2jVEg)

Bad request:
Task not found for this model

1 Like

The Serverless Inference API is currently being degraded, and all but the most famous models have been turned off. Since unsloth is relatively famous, it may still be usable in some cases…
This is probably the cause.

There has also been a problem since the beginning that it does not work properly without a README.md. This is the configuration problem mentioned below.

So, what are our options then? I’m also very new and got here through a tutorial. I’m not sure what the alternative is to the Inference API, or if I would have to migrate my project somewhere else.

1 Like

To be honest, there is no substitute for the Inference API. There are plenty of paid services online, but we’ll exclude them this time.

What you can do for free with HF is use the Inference API from Gradio, and this is still relatively easy to use even for new models created by the user, as long as the model size is within 10GB (there is a limit here).
For more details, please see below.

Also, if you want to use high-speed inference, the Pro subscription for $9 a month comes with 10 units of Zero GPU space, which is convenient, but it’s quite tricky to use, and there is also a 25-minute GPU usage limit per day. It’s not as easy as the Inference API. However, it can do a lot of things and is extremely powerful…

Thanks for all the help. I’m kind of stuck on Gradio though because I’m getting this error. Not sure if there’s something wrong with how I made the original repo.

1 Like

Nvm got it working. Apparently in the app.py the hf_token parameter needs to be replaced with token

1 Like

Is there any point when we can expect the Inference API to be back up? I tried getting it to work with the Gradio space but it didn’t seem to help much

1 Like

I don’t think it will be restored, because the problem is caused by a shortage of shared resources.

Well… Darn. I guess I’ll just have to either figure out how to modify my code to work with the Gradio space or figure out how to work with the Pro features. Thanks for the help either way.

1 Like