Fine-tune LLM model for document analysis

We have built a project about document data extraction by fine-tune LayoutLM series model. The processs is like first using OCR engine to get the words and bboxs. Then fine-tuned model will predict the entity type and build the relationship between these entities.

Now we are searching if there has a multi-modal LLM can be fine-tuned by our specific source documents?