Variable num_predict in target_mapping for XLNet

I might be wrong here but I would assume that in the language modeling task, num_predict is actually the size of the vocabulary because for each mask you try to predict the highest probability token in the vocab (in MLM). Seq_len is the max length of sequences you want to be able to model. If a sentence is smaller then you just pad it.