When we use a sequence of messages for training (e.g. user message #1, assistant message #1, user message #2, assistant message #2), is the model trained to generate only the last message (i.e. assistant message #2) of the assistant in the sequence, and all previous messages are used as context, or is the model trained to generate each assistant message separately (i.e. both assistant message #1 and assistant message #2)? That is, what is the target of the model?
Related topics
Topic | Replies | Views | Activity | |
---|---|---|---|---|
Questions about ordering training inputs when fine-tuning models | 5 | 2436 | December 4, 2023 | |
How is the prompt + answer handled during training | 0 | 110 | March 20, 2024 | |
SFT - training on generations only | 0 | 193 | August 30, 2023 | |
Structuring chat histories while also mitigating more than one chatbot response | 0 | 395 | December 16, 2023 | |
Whats happening in the SFT trainer? | 13 | 2106 | January 20, 2025 |