Perceiver IO Masked Language

Hello,

I was wondering where in the code of perceiver io, the model identifies that there is a masked input to predict ? I know that the mask is reserved as the number 3, in their original example but after adding positional encodings and all that how is it able to identify which input is masked and which is not, to finally output a prediction for a specific input?

Thanks !