For BERT LMs ... are the random tasks created on just the first sentence or the second as well?

Just wondering if the 15% masked tokens are selected randomly on the first sentence or over the entire sequence (first and 2nd sentence)?

Thanks - wg

Looks like all input tokens, except for special tokens are masked.

So, yes the second sentence will have tokens masked as well, but not the [SEP] between it and the first (and the other special tokens).

Why I think so:
Looks like probability_matrix is just a tensor the shape of (all) inputs, filled with the 15% probability of being masked…, except for the special tokens where are set to have a 0% probability of masking before being passed to torch.bernoulli here:

1 Like