Is it possible to use Vision Encoder Decoder model for extracting text in document and then classifying the extracted texts

YanaS · April 12, 2023, 11:01am

I have created Vision encoder-decoder model (swin/bert) for image-to-text extraction. But instead of telling the model to predict the ground truth, I want it to classify the already-founded texts. Is this possible and do I explain it well enough?

Topic		Replies	Views
Img2seq model with pretrained weights Beginners	7	1215	November 18, 2021
Image captioning for Japanese with pre-trained vision and text model Flax/JAX Projects	0	1165	June 23, 2021
Question on text input in image captioning Beginners	0	268	December 4, 2022
Image captioning for Spanish with pre-trained vision and text model Flax/JAX Projects	13	2474	July 19, 2021
Attaching a vision decoder to VisionTextDualEncoder Models	0	264	May 10, 2023

Is it possible to use Vision Encoder Decoder model for extracting text in document and then classifying the extracted texts

Related topics