NLP course: the reason the model outputs logits - question

According to the explanation here -

all :hugs: Transformers models output the logits, as the loss function for training will generally fuse the last activation function, such as SoftMax, with the actual loss function, such as cross-entropy

Can anyone please explain that? What does it mean to “fuse the last activation function, such as SoftMax, with the actual loss function…” ?