image name should be relevant to image content and 2-6 words
1 Like
I think it would be sufficient to add a system prompt such as “Generate within 6 words” to this small general-purpose VLM, generate the output, and then process it as a string in Python…