I’m having trouble getting my data into the proper shape to train a TransformerXL model from scratch. I have a custom Pytorch Dataset that returns a dict from
__getitem__ with the keys
labels both assigned the same 1-D Tensor of ids for a particular sequence. When I pass this Dataset to the Trainer object with the default_collator I get a
grad can be implicitly created only for scalar outputs error. What am I missing here? Am I missing a needed field for the
forward method? I’ve tried just about everything and cannot figure it out.
Would be hard to answer this without looking at the code, could you post a small code snippet ?
TransformerXL is a language model so the required inputs are
labels, which are of shape
Thanks for the reply. After wrestling with this for a little longer I realized that
forward method isn’t compatible with the
Trainer class because it returns a 1D Tensor of
losses rather than a scalar
loss. This is confirmed in this issue I created on Github. I’m going to try and create a custom
Trainer to reduce this loss and post my results here.