Data shape needed for training TransformerXL from scratch

jodiak · January 6, 2021, 7:16am

Hello everyone,

I’m having trouble getting my data into the proper shape to train a TransformerXL model from scratch. I have a custom Pytorch Dataset that returns a dict from __getitem__ with the keys input_ids, and labels both assigned the same 1-D Tensor of ids for a particular sequence. When I pass this Dataset to the Trainer object with the default_collator I get a grad can be implicitly created only for scalar outputs error. What am I missing here? Am I missing a needed field for the TransfoXLLMHeadModel's forward method? I’ve tried just about everything and cannot figure it out.

valhalla · January 11, 2021, 6:45am

Hi @jodiak

Would be hard to answer this without looking at the code, could you post a small code snippet ?
Also TransformerXL is a language model so the required inputs are input_ids and labels, which are of shape [batch_size, seq_len].

jodiak · January 12, 2021, 5:07am

Hello @valhalla

Thanks for the reply. After wrestling with this for a little longer I realized that TransfoXLLMHeadModel's forward method isn’t compatible with the Trainer class because it returns a 1D Tensor of losses rather than a scalar loss. This is confirmed in this issue I created on Github. I’m going to try and create a custom Trainer to reduce this loss and post my results here.

Topic		Replies	Views
Questions when doing Transformer-XL Finetune with Trainer Beginners	3	1065	October 6, 2021
How to use Transformer XL for sequence classification? 🤗Transformers	2	602	October 6, 2021
Dataset expected by Trainer Beginners	5	9075	September 28, 2020
KeyError: 'loss' even after appending labels while Fine Tuning Transformer XL Beginners	2	802	May 10, 2021
Training Transformer XL from scratch Beginners	0	900	May 22, 2021

Data shape needed for training TransformerXL from scratch

Related topics