Fine-tune with SFTTrainer

When we use a sequence of messages for training (e.g. user message #1, assistant message #1, user message #2, assistant message #2), is the model trained to generate only the last message (i.e. assistant message #2) of the assistant in the sequence, and all previous messages are used as context, or is the model trained to generate each assistant message separately (i.e. both assistant message #1 and assistant message #2)? That is, what is the target of the model?