Hi, can someone please explain the if the vision transformer contains bidirectional attention mechanism between one patch embedding and another patch embedding? If not, how can we add it?
Hi, can someone please explain the if the vision transformer contains bidirectional attention mechanism between one patch embedding and another patch embedding? If not, how can we add it?