Which strategy is better for text pre-processing in training a transformer model

I have a text dataset and I need to train an MLM model on that. Which strategy is better to do pre-process on corpus:
1- Concatenate all the texts, then tokenize it and then chuck them in 512 tokens to feed the MLM.
2. extract each sentence from dataset and then use padding or slicing the token vectors.