Question about truncate length of tokenizer

I’m using Roberta-large model to train a mask language model.Generally, there are token ‘<mask>’ in the input of a mlm. But what if the input is too long, tokenizer cut the ‘<mask>’ token off . Does this cause a problem? In my opinion, when training, cutting off ‘<mask>’ means this input doesn’t contribute to the loss. When inferencing, if there is no ‘’ in input, mlm doesn’t know what to predict, so it just output the same as input, am I right?
Please forgive my language, not a native speaker.

The Hugging Face example scripts will usually not truncate the texts and will instead group the texts. If your max length is 512, and your examples are of sequence length 100, 200, 300, 700, 800, 900, then this will be grouped into 6 chunks of 512. Doing it this way will result in no truncated tokens.

Edit: If the total number of tokens is not a multiple of the chunk size, the remainder will be dropped. My original statement was incorrect. e.g. 4096 tokens would get chunked into 8 chunks with no truncation. 5000 tokens would get chunked into 8 chunks and would truncate 4 tokens.

Since 15% of the tokens are masked, it would be very rare that a sequence has no mask tokens. Even if it did, there wouldn’t be any loss for this step because there wouldn’t be any predictions and labels.