I’m having trouble getting my data into the proper shape to train a TransformerXL model from scratch. I have a custom Pytorch Dataset that returns a dict from __getitem__ with the keys input_ids, and labels both assigned the same 1-D Tensor of ids for a particular sequence. When I pass this Dataset to the Trainer object with the default_collator I get a grad can be implicitly created only for scalar outputs error. What am I missing here? Am I missing a needed field for the TransfoXLLMHeadModel's forward method? I’ve tried just about everything and cannot figure it out.
Would be hard to answer this without looking at the code, could you post a small code snippet ?
Also TransformerXL is a language model so the required inputs are input_ids and labels, which are of shape [batch_size, seq_len].
Thanks for the reply. After wrestling with this for a little longer I realized that TransfoXLLMHeadModel's forward method isn’t compatible with the Trainer class because it returns a 1D Tensor of losses rather than a scalar loss. This is confirmed in this issue I created on Github. I’m going to try and create a custom Trainer to reduce this loss and post my results here.