Why split sequences into shorter chunks when pretraining llm

In the Transformers doc about language modeling, Causal language modeling

we need to do data preprocess below:

  • concatenate all the sequences
  • split the concatenated sequences into shorter chunks defined by block_size, which should be both shorter than the maximum input length and short enough for your GPU RAM.

This kind of processing will generate many incomplete sentences. Additionally, I noticed that the text used in the examples is mostly documents, such as wikis, which might not have a significant impact. So, when the text is in a question-and-answer format, like a question followed by an answer, will this chunking operation still be necessary?