About extracting text information as well as relevant images from document likes pdf doc etc

Use case is to extract the relevant text information along with images available in the file using generative ai, When any prompt is given then relevant text information and image should display as response.

Kindly help by providing some ideas, links or techniques.
Thank you.