As far as I know, SFT is basically continued post-training, updating weight by letting the model predict the next tokens. If this is correct, then why is SFT categorized in Transformer Reinforcement Learning API? Am I missing something?
1 Like