Fail to deploy newer models

Hi,

I am trying to deploy Qwen2.5-VL-7B-Instruct and Deepseek-VL2-tiny to an instance of GPU · Nvidia L40S · 1x GPU · 48 GB.
This is what I get (pretty much the same for both):

[Server message]Endpoint failed to start
See details
Exit code: 3. Reason: py", line 569, in __aenter__
    await self._router.startup()
  File "/usr/local/lib/python3.11/dist-packages/starlette/routing.py", line 670, in startup
    await handler()
  File "/app/webservice_starlette.py", line 62, in prepare_model_artifacts
    inference_handler = get_inference_handler_either_custom_or_default_handler(
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/huggingface_inference_toolkit/handler.py", line 159, in get_inference_handler_either_custom_or_default_handler
    return HuggingFaceHandler(model_dir=model_dir, task=task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/app/huggingface_inference_toolkit/handler.py", line 22, in __init__
    self.pipeline = get_pipeline(
                    ^^^^^^^^^^^^^
  File "/app/huggingface_inference_toolkit/utils.py", line 252, in get_pipeline
    hf_pipeline = pipeline(task=task, model=model_dir, device=device, **kwargs)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/transformers/pipelines/__init__.py", line 849, in pipeline
    config = AutoConfig.from_pretrained(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/dist-packages/transformers/models/auto/configuration_auto.py", line 1073, in from_pretrained
    raise ValueError(
ValueError: The checkpoint you are trying to load has model type `deepseek_vl_v2` but Transformers does not recognize this architecture. This could be because of an issue with the checkpoint, or because your version of Transformers is out of date.

You can update Transformers with the command `pip install --upgrade transformers`. If this does not work, and the checkpoint is very new, then there may not be a release version that supports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command `pip install git+https://github.com/huggingface/transformers.git`

Application startup failed. Exiting.

Then I try to use a custom container:

huggingface/transformers-all-latest-gpu:latest

And this is what I get:

[Server message]Endpoint failed to start
See details
Exit code: 0. Reason: application does not seem to be initialized

What am I doing wrong?
Any tips would be highly appreciated.

1 Like

It looks like the version of the Transformers library is too old to recognize DeepSeek, but can users upgrade the Transformers in the Endpoint API…?

This?

My understanding is that the default container has the version of Transformers that is too old to run those models.
This should be solved by using a custom container with newer version.
The only problem is that it doesn’t work for some reason.

1 Like

This is the version from last March… this is old.
If the custom container doesn’t work, I guess users will have to do something with the requiements.txt.

transformers==4.48.2

Default

  • transformers[sklearn,sentencepiece,audio,vision]: 4.38.2
1 Like

Hi @noanem Have you deleted the endpoint? We can take a closer look.

You may already know this, but we have a catalog with some ready-to-deploy models that don’t require additional customization. Here are some of the available models: Inference Catalog | Inference Endpoints by Hugging Face that might work for your use case.

1 Like