Mandatory Fundoscopy and Advanced AI Model for Medical Consultation

Hi there,
I have over 100,000 fundus pictures in PDF and Word formats, along with some videos of eye bottoms. I’m currently seeking a suitable LLM model to process and automatically interpret these fundus images. Additionally, I have approximately 5000 explanations of such pictures in my Word documents. Could anyone recommend which Vision Transformer model out of the available 15 is most suitable for this task?
Also I am in search of an advanced Language Model (LLM) capable of providing professional and accurate responses to inquiries from various patients. These inquiries will be submitted through specialized forms, which have been prepared based on my knowledge spanning over 10,000 Word files. The AI should be adept at answering and correcting patients regarding conditions such as epilepsy, neurosis, vascular diseases, nervous system disorders, spinal cord injuries, and more.
Could anyone recommend a suitable LLM model to start with for this purpose too? Thank you!