Can't perform image inference with Gemma 3 12b it qat4.0

hi, i am a noob here. please can you share a code snippet for how to use this gemma 3 version to perform inference on images? specific i want it to filter images from an input folder into different output folder based on a set of criteria i outline in the prompt. the prompt tells it to answer yes or no, if an image meets or doesn’t meet the criteria, then my code uses that to move the images to their appropriate folders.


here is my prompt:
image_soft_token
Analyze the image. Does it meet BOTH criteria: 1. At least 2 football players visible. 2. At least one player performing a clear football action (kick, tackle, dribble, save etc.)? Answer ONLY YES or NO.


here is the output:
ERROR:root:STEP 3 FAILED: Error during processor preparation for laliga_image_100.jpeg: Prompt contained 0 image tokens but received 1 images.
Traceback (most recent call last):
File “”, line 40, in analyze_image_gemma3_transformers
inputs = processor(text=PROMPT_TEXT_CLASSIFY, images=img, return_tensors=“pt”).to(device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File “/usr/local/lib/python3.11/dist-packages/transformers/models/gemma3/processing_gemma3.py”, line 122, in call
raise ValueError(
ValueError: Prompt contained 0 image tokens but received 1 images.
Found 457 image files in ‘Colab_Uploads/Football_Images_Input’.
Starting Gemma 3 Transformers processing loop for 457 images…
Processing time depends on hardware (cuda).
— DIAGNOSTIC MODE: Processing ONLY the first file: laliga_image_100.jpeg —
— Starting analysis for: laliga_image_100.jpeg —
STEP 2 SUCCESS: Loaded image laliga_image_100.jpeg
DEBUG: Prompt being passed to processor:

image_soft_token
Analyze the image. Does it meet BOTH criteria: 1. At least 2 football players visible. 2. At least one player performing a clear football action (kick, tackle, dribble, save etc.)? Answer ONLY YES or NO.

<<<
— Finished analysis attempt for: laliga_image_100.jpeg —
— DIAGNOSTIC MODE: Finished processing laliga_image_100.jpeg —
— Gemma 3 Transformers Processing Session Complete —
Images attempted in this session (Gemma3 TF): 1

  • Successfully classified (YES/NO): 0
  • Errors (moved to ‘Football_Images_Errors_Gemma3_TF’): 1
    Images skipped (already processed): 0
    Estimated image files remaining in ‘Colab_Uploads/Football_Images_Input’: 456
    Check the ‘Football_Images_Errors_Gemma3_TF’ folder for Gemma 3 TF processing errors.
    Results are in ‘Football_Images_Meets_Criteria_Gemma3_TF’ and ‘Football_Images_Does_Not_Meet_Gemma3_TF’.

here is gemini 2.5 pro’s suggestion:
Okay, the added print statement confirms it perfectly.

  1. The Variable is Correct: The DEBUG: Prompt being passed to processor: output clearly shows the string does start with <image_soft_token>\n… So, the variable PROMPT_TEXT_CLASSIFY is correctly updated and passed to the function.

  2. The Processor Fails: Despite receiving the correct prompt string containing the <image_soft_token>, the processor’s internal logic (/usr/local/lib/python3.11/dist-packages/transformers/models/gemma3/processing_gemma3.py, line 122) still fails to detect it and incorrectly reports finding 0 image tokens.

Conclusion:

This definitively looks like a bug within the Gemma3Processor implementation in the transformers library specifically for the model handle google/gemma-3/transformers/gemma-3-12b-it-qat-int4-unquantized (or perhaps for Gemma 3 processing in general in the current library version).

The processor is simply not correctly parsing the special token it claims to use (<image_soft_token>) from the text input when an image is also provided.


i am running this in google colab. i had to remove the <> tag symbols from image_soft_token to display here to indicate that i include the tag in the prompt

Gemini 2.5 pro says it is a bug with the transformer architecture for this model, that i should report it on their github, but i just want to be sure it isn’t actually due to my lack of knowledge. i would be very grateful for any help on this

1 Like

It seems that this error occurs when you pass an image file but do not describe the image in the prompt.

It should be easier to understand if you look at the sample code for Gemma 3.

messages = [
    {
        "role": "system",
        "content": [{"type": "text", "text": "You are a helpful assistant."}]
    },
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, # Maybe you don't have such line
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    }
]