Questions when doing Transformer-XL Finetune with Trainer

Note that TransformerXL is the only model of the library that does not work with Trainer as the loss it returns is not reduced (it’s an array and not a scalar). You might get away with it by implementing your own subclass of Trainer and override the compute_loss function to convert that array to a scalar.