Suppose I have N prompts(sentences) for generation. They are fed into GPT2 and we get the corresponding synthesis sentences. And I have a separate black box which can return loss given these synthesis samples. The black box is just another component. It is natural to think that , for every batch, GPT2 generate samples and get the loss with respect to the current GPT2, repeatedly. The goal of GPT2 is to reduce the loss in each update iteration.
What I want to do is use the loss from the black box to update the parameters of GPT2, at each batch.
The generation of GPT2 is quite simple, but how can I implement the idea of updating it with the loss? Is there any example for doing this ? Especially how to properly update the parameters? I mean should I update them equally, without any difference ?
Please give some thoughts, thanks.