This might be related, as it discusses a recently fixed bug with a Colan notebook.
But to answer your question concerning the results: the results when using the Trainer or your own trainer should be the same as long as you use the same loss function and hyperparameters.
Trainer is mostly there to take the boilerplate out of your way, especially for mixed-precision training, distributed training and TPU training, but it only does the training loop (with good hyperparameter defaults) so it should match your manual training loop.
Unfortunately I never got it to work with the Transformer-XL implementation I was working on, but I modified BERT to fit my application and it works with that instead