Note that TransformerXL
is the only model of the library that does not work with Trainer
as the loss it returns is not reduced (it’s an array and not a scalar). You might get away with it by implementing your own subclass of Trainer
and override the compute_loss
function to convert that array to a scalar.