Is hf supports split multiple sents into one sequence with <eos> when trainning gpt ´╝îso that receive compute resources

Like metraon-lm [ltor](https://github.com/NVIDIA/Megatron-LM/blob/0bb597b42c53355a567aba2a1357cc34b9d99ddd/megatron/utils.py#L146) implementation´╝î
i found gpt2 in hf only support casual attention mask.