Attaching a vision decoder to VisionTextDualEncoder

spencerh · May 10, 2023, 3:57pm

Hi!

I’ve been scourging the forums and the internet and it seems there’s very little documentation about VisionTextdualEncoder. Specifically about how we can use the unified embedding space to do tasks that would require a decoder like image segmentation. Can anyone help or direct me to the next step? To stay in the HF ecosystem, I could imagine a scenario where we create a child model class that derives from VisionTextDualEncoder but then adds a classifier at the end? Thanks!

Topic		Replies	Views
Use VisionTextDualEncoder for image-text retrieval Intermediate	0	584	December 13, 2022
Converting CLIPModel to VisionTextDualEncoderModel 🤗Transformers	1	163	March 21, 2024
Is it possible to use Vision Encoder Decoder model for extracting text in document and then classifying the extracted texts 🤗Transformers	0	228	April 12, 2023
Question on text input in image captioning Beginners	0	268	December 4, 2022
Use only encoder to generate the image embeddings in a VisionEncoderDecoderModel such as Donut Models	1	794	February 6, 2024

Attaching a vision decoder to VisionTextDualEncoder

Related topics