Multimodal fusion options - thoughts?

simonthekelpie · May 6, 2025, 2:14am

Hi All
I’m buidling a multimodal healthcare binary classification model and stuck. The modals are tabular, text and 2 imaging. I’ll use modality specific techniques for feature embedding eg. CNN for images, transformer for text and GBDT for tabular.
Now, heres the tricky part, for each subject there could be missing data i.e having text, tabular and 1 image but not the other.
Can anyone suggest the best way to fuse the embeddings taking into account the missing data. Thoughts are cross-attention, TFN, low rank multimodal fusion - but again the missing data issue?
Thanks

Topic		Replies	Views
Multimodal transformer Models	0	1071	April 23, 2023
AnyModal – A Framework for Multimodal LLMs Show and Tell	0	249	November 17, 2024
Multimodal Transformers with signal inputs Beginners	0	90	May 9, 2024
Multimodal architectures with HuggingFace transformers for speech and text 🤗Transformers	3	1132	November 14, 2022
Multimodal datasets and corresponding models Beginners	2	73	March 12, 2025

Multimodal fusion options - thoughts?

Related topics