Use only encoder to generate the image embeddings in a VisionEncoderDecoderModel such as Donut

anon86149130 · June 27, 2023, 11:27am

I want to use the pre-trained checkpoint of the Donut model on my document images to generate the embeddings for each image that I can use in a subsequent pipeline. I don’t want the decoder output, just need the image embeddings that are fed into the decoder to generate the text completion. Any ideas or code samples on how I can achieve that?

anonymous-developer · February 6, 2024, 4:10pm

I am also searching for this answer, let me know if you get the solution for this.

Topic		Replies	Views
Creating custom Donut model Models	0	716	March 16, 2023
Adding another head to Vision encoder decoder model Intermediate	4	332	May 7, 2024
Different model performance after saving and loading Donut model 🤗Transformers	1	353	July 6, 2024
Caching encoder state for multiple encoder-decoder `.generate()` calls? 🤗Transformers	2	241	April 12, 2024
Finetune Donut with new tokenizer Intermediate	6	2596	October 10, 2023

Use only encoder to generate the image embeddings in a VisionEncoderDecoderModel such as Donut

Related topics