FineTuning - Possible to extract captions embedded in png files?

Since png can hold text it seems like this would be supported by default but I cant figure out how to extract the captions from the png files when setting up my training images. Is it even possible? :thinking:

Thank you for any help!

Donut model might help you extract the captions from images. Below are nice resources for finetuning donut: