Scalar Reward Model

I have a generic question about reward model training for LLMs. I have an application scenario where (1) my input is natural language text and reward function is defined by scalar scores 0, 1, 2 etc. For this reason, it seems like in order to train my reward model I should use the TextClassification interface. However, (2) my input also has a “context-response” structure, and the scalar scores correspond to how well the response is wrt the context.

My question: Is TextClassification the best interface I can use? Ideally, I would like to train the reward model to predict the score for the response given the context, so perhaps I am looking for a conditional reward model if that exists?

1 Like

It looks like TextClassification with RLHF is fine.

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.