Data shape needed for training TransformerXL from scratch

Hello everyone,

I’m having trouble getting my data into the proper shape to train a TransformerXL model from scratch. I have a custom Pytorch Dataset that returns a dict from __getitem__ with the keys input_ids, and labels both assigned the same 1-D Tensor of ids for a particular sequence. When I pass this Dataset to the Trainer object with the default_collator I get a grad can be implicitly created only for scalar outputs error. What am I missing here? Am I missing a needed field for the TransfoXLLMHeadModel's forward method? I’ve tried just about everything and cannot figure it out.

Hi @jodiak

Would be hard to answer this without looking at the code, could you post a small code snippet ?
Also TransformerXL is a language model so the required inputs are input_ids and labels, which are of shape [batch_size, seq_len].

Hello @valhalla

Thanks for the reply. After wrestling with this for a little longer I realized that TransfoXLLMHeadModel's forward method isn’t compatible with the Trainer class because it returns a 1D Tensor of losses rather than a scalar loss. This is confirmed in this issue I created on Github. I’m going to try and create a custom Trainer to reduce this loss and post my results here.