Hello! I am a student currently working on developing a multimodal transformer model with unimodal encoders feeding into a bottleneck fusion layer. This is my first time working with multimodality and I was confused while learning some background information: How is the full model trained? Since e…

Dimensionality matching in multimodal transformer model

Felicitywood July 12, 2025, 7:37am 3

Use aux loss on each encoder before fusion?

2 Likes