hi, i am a noob here. please can you share a code snippet for how to use this gemma 3 version to perform inference on images? specific i want it to filter images from an input folder into different output folder based on a set of criteria i outline in the prompt. the prompt tells it to answer yes or no, if an image meets or doesnât meet the criteria, then my code uses that to move the images to their appropriate folders.
here is my prompt:
image_soft_token
Analyze the image. Does it meet BOTH criteria: 1. At least 2 football players visible. 2. At least one player performing a clear football action (kick, tackle, dribble, save etc.)? Answer ONLY YES or NO.
here is the output:
ERROR:root:STEP 3 FAILED: Error during processor preparation for laliga_image_100.jpeg: Prompt contained 0 image tokens but received 1 images.
Traceback (most recent call last):
File ââ, line 40, in analyze_image_gemma3_transformers
inputs = processor(text=PROMPT_TEXT_CLASSIFY, images=img, return_tensors=âptâ).to(device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File â/usr/local/lib/python3.11/dist-packages/transformers/models/gemma3/processing_gemma3.pyâ, line 122, in call
raise ValueError(
ValueError: Prompt contained 0 image tokens but received 1 images.
Found 457 image files in âColab_Uploads/Football_Images_Inputâ.
Starting Gemma 3 Transformers processing loop for 457 imagesâŚ
Processing time depends on hardware (cuda).
â DIAGNOSTIC MODE: Processing ONLY the first file: laliga_image_100.jpeg â
â Starting analysis for: laliga_image_100.jpeg â
STEP 2 SUCCESS: Loaded image laliga_image_100.jpeg
DEBUG: Prompt being passed to processor:
image_soft_token
Analyze the image. Does it meet BOTH criteria: 1. At least 2 football players visible. 2. At least one player performing a clear football action (kick, tackle, dribble, save etc.)? Answer ONLY YES or NO.
<<<
â Finished analysis attempt for: laliga_image_100.jpeg â
â DIAGNOSTIC MODE: Finished processing laliga_image_100.jpeg â
â Gemma 3 Transformers Processing Session Complete â
Images attempted in this session (Gemma3 TF): 1
- Successfully classified (YES/NO): 0
- Errors (moved to âFootball_Images_Errors_Gemma3_TFâ): 1
Images skipped (already processed): 0
Estimated image files remaining in âColab_Uploads/Football_Images_Inputâ: 456
Check the âFootball_Images_Errors_Gemma3_TFâ folder for Gemma 3 TF processing errors.
Results are in âFootball_Images_Meets_Criteria_Gemma3_TFâ and âFootball_Images_Does_Not_Meet_Gemma3_TFâ.
here is gemini 2.5 proâs suggestion:
Okay, the added print statement confirms it perfectly.
-
The Variable is Correct: The DEBUG: Prompt being passed to processor: output clearly shows the string does start with <image_soft_token>\n⌠So, the variable PROMPT_TEXT_CLASSIFY is correctly updated and passed to the function.
-
The Processor Fails: Despite receiving the correct prompt string containing the <image_soft_token>, the processorâs internal logic (/usr/local/lib/python3.11/dist-packages/transformers/models/gemma3/processing_gemma3.py, line 122) still fails to detect it and incorrectly reports finding 0 image tokens.
Conclusion:
This definitively looks like a bug within the Gemma3Processor implementation in the transformers library specifically for the model handle google/gemma-3/transformers/gemma-3-12b-it-qat-int4-unquantized (or perhaps for Gemma 3 processing in general in the current library version).
The processor is simply not correctly parsing the special token it claims to use (<image_soft_token>) from the text input when an image is also provided.
i am running this in google colab. i had to remove the <> tag symbols from image_soft_token to display here to indicate that i include the tag in the prompt
Gemini 2.5 pro says it is a bug with the transformer architecture for this model, that i should report it on their github, but i just want to be sure it isnât actually due to my lack of knowledge. i would be very grateful for any help on this