Bidirectional attention mechanism in vision transformer

Hi, can someone please explain the if the vision transformer contains bidirectional attention mechanism between one patch embedding and another patch embedding? If not, how can we add it?