Since png can hold text it seems like this would be supported by default but I cant figure out how to extract the captions from the png files when setting up my training images. Is it even possible?
Thank you for any help!
Since png can hold text it seems like this would be supported by default but I cant figure out how to extract the captions from the png files when setting up my training images. Is it even possible?
Thank you for any help!
Donut model might help you extract the captions from images. Below are nice resources for finetuning donut: