For BERT LMs ... are the random tasks created on just the first sentence or the second as well?

Just wondering if the 15% masked tokens are selected randomly on the first sentence or over the entire sequence (first and 2nd sentence)?

Looks like all input tokens, except for special tokens are masked.

So, yes the second sentence will have tokens masked as well, but not the [SEP] between it and the first (and the other special tokens).

Why I think so:
Looks like probability_matrix is just a tensor the shape of (all) inputs, filled with the 15% probability of being masked…, except for the special tokens where are set to have a 0% probability of masking before being passed to torch.bernoulli here:

