What is the current SOTA model for captioning images in documents?
I need good descriptions of diagrams. Most of the ones I have seen have very basic descriptions “the image contains a woman in a blue dress”. I need more like “The figure shows a flowchart representing a process of… that starts with…and ends with…key steps are…”
Or “The image depicts a scene in which people walk about in a modern cafe, key elements of the cafes design are…”
In other words I need a good paragraph that offers some insight into the image.
Any suggestions on models?