Hello everyone,
I’m having trouble getting my data into the proper shape to train a TransformerXL model from scratch. I have a custom Pytorch Dataset that returns a dict from __getitem__
with the keys input_ids
, and labels
both assigned the same 1-D Tensor of ids for a particular sequence. When I pass this Dataset to the Trainer object with the default_collator I get a grad can be implicitly created only for scalar outputs
error. What am I missing here? Am I missing a needed field for the TransfoXLLMHeadModel
's forward
method? I’ve tried just about everything and cannot figure it out.