Difference between calling model() and using Trainer()?

Hi All,

I was wondering if there is any tangible difference between calling model() and feeding data in manually, or using the Trainer() object?

If I run a program to batch my data and feed it manually, will the training results be the same as using Trainer()?

I only ask because I am getting some errors while applying the language modelling protocol (from: https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/01_how_to_train.ipynb#scrollTo=YZ9HSQxAAbme) with the Trainer() module but manually seems to work fine.

This might be related, as it discusses a recently fixed bug with a Colan notebook.

But to answer your question concerning the results: the results when using the Trainer or your own trainer should be the same as long as you use the same loss function and hyperparameters.

Trainer is mostly there to take the boilerplate out of your way, especially for mixed-precision training, distributed training and TPU training, but it only does the training loop (with good hyperparameter defaults) so it should match your manual training loop.

Wonderful, thanks for the clarification!

I was dealing with a different error: RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

I will look into it!

Did you find out what was wrong?
I’m running into the same kind of error with the Trainer object

Unfortunately I never got it to work with the Transformer-XL implementation I was working on, but I modified BERT to fit my application and it works with that instead