Using EncoderDecoderModel

The VisionEncoderDecoderModel class is now available (as well as TrOCR): Vision Encoder Decoder Models — transformers 4.12.0.dev0 documentation