I have a pretraining task for XLNet. One of the inputs for XLNetLMHeadModel is target_mapping that is of the shape (batch_size,num_predict,seq_len).
I want to predict for all tokens in an input sentence, which means num_predict will vary within a batch, for sentences of different length. This leads to error while building a data_loader in PyTorch. Can anyone suggest a workaround for this problem?
Thanks
I might be wrong here but I would assume that in the language modeling task, num_predict is actually the size of the vocabulary because for each mask you try to predict the highest probability token in the vocab (in MLM). Seq_len is the max length of sequences you want to be able to model. If a sentence is smaller then you just pad it.
Dear @BramVanroy, thank you for the reply, but num_predict is not the size of vocabulary. It’s the number of predictions to be made for that particular input sentence. seq_len is not the issue because as you pointed out, it can be padded. But I am not sure if the same is true for num_predict, hence this question
You are absolutely right.
But if you want to predict all tokens, can’t you just leave target_mapping
to the default (None)? From the docs:
If target_mapping is None, then num_predict corresponds to sequence_length.
Or is your point that this mapping does not take into account the different sequence lengths in the batch? If that is the question, I cannot help with that. I do not have enough experience with XLNet. Perhaps someone else can chime in.
1 Like