I was wondering if there is any tangible difference between calling
model() and feeding data in manually, or using the
If I run a program to batch my data and feed it manually, will the training results be the same as using
I only ask because I am getting some errors while applying the language modelling protocol (from:
https://colab.research.google.com/github/huggingface/blog/blob/master/notebooks/01_how_to_train.ipynb#scrollTo=YZ9HSQxAAbme) with the
Trainer() module but manually seems to work fine.
This might be related, as it discusses a recently fixed bug with a Colan notebook.
@sgugger, in case you’re not aware of it, it seems the latest commit on master broke the Colab notebook you shared on Twitter
Trying to run that notebook, I hit the following error when trying to run
best_run = trainer.hyperparameter_search(n_trials=10, direction="maximize")
with the optuna backend.
invalid value encountered in double_scalars
[W 2020-10-22 14:58:41,815] Trial 0 f…
But to answer your question concerning the results: the results when using the Trainer or your own trainer should be the same as long as you use the same loss function and hyperparameters.
Trainer is mostly there to take the boilerplate out of your way, especially for mixed-precision training, distributed training and TPU training, but it only does the training loop (with good hyperparameter defaults) so it should match your manual training loop.
Wonderful, thanks for the clarification!
I was dealing with a different error:
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
I will look into it!
Did you find out what was wrong?
I’m running into the same kind of error with the
Unfortunately I never got it to work with the Transformer-XL implementation I was working on, but I modified BERT to fit my application and it works with that instead