T5 user defined loss function

You need a differentiable model to do the sampling for you :slight_smile:

Let V be the set of words in the vocabulary. Some models define a reinforcement learning model with a state space vector x with dimension |V|, such that x_i can be any integer in V, and a discreet action space of all integers in V.

Someone linked a paper from salesforce which follows this general idea but adds a few useful bells and whistles.