Advancements in computer vision and deep learning techniques carry the potential to make significant contributions to healthcare. The current state-of-the-art models for automated diagnosis and outcome prediction using medical imaging tend not to consider additional information such as medical reports.
A multimodal model like CLIP pre-trained in medical data could allow new medical applications that combine text and image.
The MIMIC-CXR dataset can be used for this task. For privacy reasons, the dataset in question has restricted access. Anyone who wants to participate in this project must obtain the necessary credentials to access the dataset.
In my experience, getting access to MIMIC-CXR is not particularly complicated, it’s necessary to accept the terms of the license and take a short course on medical data management. It normally takes ~2 weeks to get such credentials.
A training script for this will be provided soon. (see PR )
Carrying out a proper evaluation of the model may be difficult