Training generative models based on "rewards"

danyaljj · December 4, 2020, 10:23pm

Suppose you we want to train BART/T5. Typically these models are trained assuming that we have direct access to gold outputs. I am interested in a slightly different setting: suppose you don’t have the gold output, but you have access to a black-box (a reward function) that tells you how “correct” is the current generation. Does anyone have thoughts on how this could be done?

Topic		Replies	Views
Generating actual answers from QA models Beginners	2	1163	October 26, 2021
BART pre-training? Beginners	5	1839	August 5, 2023
What is the difference between T5 and BART model? 🤗Transformers	0	3340	December 2, 2021
How to Train a Generative Pre-training Transformer Beginners	0	136	May 26, 2024
PreTrain BART on The Pile Flax/JAX Projects	19	1636	July 1, 2021

Training generative models based on "rewards"

Related topics