Thanks for letting me know this! That’s really helpful. Or I will keep working on figuring out why Trainer is not working with Transformer-XL.
I will try to rewrite the compute_loss function for it.
Thanks for letting me know this! That’s really helpful. Or I will keep working on figuring out why Trainer is not working with Transformer-XL.
I will try to rewrite the compute_loss function for it.