T5 user defined loss function

It’s a whole field within itself and difficult to describe in a paragraph, but I’ll try to point you in the right direction.

Check out reinforcement learning first, then read that salesforce paper with newfound vigour! The way to make sampling differentiable is to train a function to do this job, such that the input is your probability distribution, and the output is some index in the range [0, V].

Beyond this explanation, I’m afraid I can’t offer too much help. Check out some of the papers with reinforcement learning in them here

1 Like