ValueError: Unsupported model type mllama

I am looking to deploy “meta-llama/Llama-3.2-11B-Vision-Instruct” to AWS SageMaker but getting below error while deploying.

Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/cli.py", line 118, in serve
    server.serve(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 297, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/server.py", line 231, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.10/site-packages/text_generation_server/models/__init__.py", line 1117, in get_model
    raise ValueError(f"Unsupported model type {model_type}")

sagemaker config

instance_type = “ml.g5.4xlarge”
number_of_gpu = 1
health_check_timeout = 1000

Define Model and Endpoint configuration parameter

config = {
‘HF_MODEL_ID’: “meta-llama/Llama-3.2-11B-Vision-Instruct”,
‘SM_NUM_GPUS’: json.dumps(number_of_gpu), # Number of GPU used per replica
‘MAX_INPUT_LENGTH’: json.dumps(6000), # Max length of input text
‘MAX_TOTAL_TOKENS’: json.dumps(8192), # Max length of the generation (including input text)
}

1 Like

I am using container - 763104351884.dkr.ecr.eu-central-1.amazonaws.com/huggingface-pytorch-tgi-inference:2.3.0-gpu-py310-cu121-ubuntu22.04.

Should I be using different container ?

1 Like

It looks like 2.3.0 is still compatible with Llama 3.2. Do other models work?

Above issue is resolved after using tgi container - 2.4

1 Like