VisionEncoderDecoder/TrOCR

Felix92 · October 21, 2021, 8:33am

Hi,
@nielsr
I try currently to setup your VisionEncoderDecoderModel with a pretrained bert model as decoder but have some struggle with the model.generate (greedy search) part.

I have setup all i have in a Colab Notebook (else it would be to much for posting directly)
Colab Notebook (Note: it is open if anyone is interested in collaborating)

Questions:
What are the difference between decoder_input_ids and labels ? (I think decoder_input_ids are for training the tokenized labels and labels the complete vocab ids right ?) (I ask in case for the DecoderEncoder model that the tokenized labels are used for both decoder_input_ids and labels)

I would like to track the CER / WER metrics in validation step is there another way without model.generate ?

Do you see a way to export this model after training to ONNX format ? I think if this will work i need to implement the greedy search by hand or do you have some solution into the transformers lib currently ?

Thanks a lot

Topic		Replies	Views
Using EncoderDecoderModel 🤗Transformers	4	1070	October 28, 2021
Replacing the decoder of an xxxEncoderDecoderModel 🤗Transformers	2	1700	December 16, 2023
T5 as Decoder for OCR Models	8	862	November 20, 2024
TrOCR repeated generation Beginners	3	1326	November 30, 2021
How to implement custom vision encoder-decoder? 🤗Transformers	1	691	August 1, 2023

VisionEncoderDecoder/TrOCR

Related topics