Since png can hold text it seems like this would be supported by default but I cant figure out how to extract the captions from the png files when setting up my training images. Is it even possible? ![]()
Thank you for any help!
Since png can hold text it seems like this would be supported by default but I cant figure out how to extract the captions from the png files when setting up my training images. Is it even possible? ![]()
Thank you for any help!
Donut model might help you extract the captions from images. Below are nice resources for finetuning donut: