Just wondering if the 15% masked tokens are selected randomly on the first sentence or over the entire sequence (first and 2nd sentence)?
Thanks - wg
Just wondering if the 15% masked tokens are selected randomly on the first sentence or over the entire sequence (first and 2nd sentence)?
Thanks - wg
Looks like all input
tokens, except for special tokens are masked.
So, yes the second sentence will have tokens masked as well, but not the [SEP]
between it and the first (and the other special tokens).
Why I think so:
Looks like probability_matrix
is just a tensor the shape of (all) input
s, filled with the 15% probability of being masked…, except for the special tokens where are set to have a 0% probability of masking before being passed to torch.bernoulli here: