Separate image encoding and decoding in PreTrained decoder-only models

Hi, currently I am trying to separate the image encoding and decoding phase of Qwen2.5-VL. As seen in the code here, transformers/src/transformers/models/qwen2_5_vl/modular_qwen2_5_vl.py at 307c5238546ba1675daabc46050c63ffde25f8e6 · huggingface/transformers · GitHub, currently the image embedding is done within the forward function (line 580-586). However, I want the image embedding to be done in a separate container and then have the result passed into the model which only does decoding. How would I do something like this?