The Trainer
class is not built to optimize two models at the same time, so no, there is no easier way than subclassing and overrifing the training_step
. In general, subclassing the Trainer and overriding the method(s) to fit your needs is the expected way and we designed the Trainer API to make it as easy as possible.
For predict/evaluate, yes Trainer will need tensors of the same size (with the exception of the batch dimension) otherwise it won’t be able to concatenate all predictions. This is something we’ll look into more when we rewrite the token-classification examples (in the next few weeks).