Regular TimesFormer takes 3 channel input images, while I have 4 channel images (RGBD).
I am struggling to find a TimesFormer (or a model similar to TimesFormer) that takes 4 channel input images.
Does anybody know such a model?
Preferably, I would like to find pretrained model with weights.