How to use Transformer XL for sequence classification?

I cannot understand how to fine-tune ‘Transformer XL’ for sequence classification.

I am getting this error

RuntimeError: stack expects each tensor to be equal size, but got [1] at entry 0 and [8] at entry 2

and I understand that it is due to my sequences having varying lengths but I am not sure how this is intended to be remedied for this specific model. I have created a simple reproduceable toy example with the intention of understanding how this model is intended to be used:

!pip install transformers==4.10.0
!pip install datasets==1.9.0

from transformers import AutoTokenizer
from transformers import TransfoXLForSequenceClassification
from transformers import TrainingArguments, Trainer
import torch

class newDatasetSimple(

    def __init__(self, encodings, labels):
        self.encodings = encodings
        self.labels = labels

    def __getitem__(self, idx):
        item = {key: torch.tensor(val[idx]) for key, val in self.encodings.items()}
        item['labels'] = torch.tensor(self.labels[idx])
        # print(item)
        return item

    def __len__(self):
        return len(self.labels)

tokenizer = AutoTokenizer.from_pretrained('transfo-xl-wt103')
model = TransfoXLForSequenceClassification.from_pretrained('transfo-xl-wt103')

texts = ['This is a sentence', 'Here is another', 'Short sentence', 'I ran', 'This is the longest sentence of the bunch', 'What?', 'Who?', 'Test.', 'Hey', 'A', 'The', 'So', 'yes', 'cool', 'beans', 'In the flesh']
labels = [0,0,1,1,0,0,1,1,0,0,1,1,0,0,1,1]
encodings = tokenizer(list(texts))

ds = newDatasetSimple(encodings, labels)

training_args = TrainingArguments(output_dir='.', 

trainer = Trainer(model=model, 


Could someone show me what changes need to be made in the code to get the model to train properly?

Thank you for reading!

Can you post the full error message?

As noted here, TransformerXL is the only model in the library that is not supported by the Trainer (you would need to overwrite it).

Okay so I need to overwrite Trainer with a custom loss funciton that converts the array to a scalar. What is the meaning of the array of losses? Should it simply be summed?

In any case, I do not believe that is the source of my error. I think I need to prepare my dataset in a different way such that it can properly be consumed by the Transformer XL model but it is not clear to me how this should be done.

Here is the full error message

RuntimeError                              Traceback (most recent call last)
<ipython-input-4-3d56e61e3b4e> in <module>()
      8                   args=training_args,
      9                   train_dataset=ds)
---> 10 trainer.train()

5 frames
/usr/local/lib/python3.7/dist-packages/transformers/ in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1256             self.control = self.callback_handler.on_epoch_begin(args, self.state, self.control)
-> 1258             for step, inputs in enumerate(epoch_iterator):
   1260                 # Skip past any already trained steps if resuming training

/usr/local/lib/python3.7/dist-packages/torch/utils/data/ in __next__(self)
    519             if self._sampler_iter is None:
    520                 self._reset()
--> 521             data = self._next_data()
    522             self._num_yielded += 1
    523             if self._dataset_kind == _DatasetKind.Iterable and \

/usr/local/lib/python3.7/dist-packages/torch/utils/data/ in _next_data(self)
    559     def _next_data(self):
    560         index = self._next_index()  # may raise StopIteration
--> 561         data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    562         if self._pin_memory:
    563             data = _utils.pin_memory.pin_memory(data)

/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/ in fetch(self, possibly_batched_index)
     45         else:
     46             data = self.dataset[possibly_batched_index]
---> 47         return self.collate_fn(data)

/usr/local/lib/python3.7/dist-packages/transformers/data/ in default_data_collator(features, return_tensors)
     65     if return_tensors == "pt":
---> 66         return torch_default_data_collator(features)
     67     elif return_tensors == "tf":
     68         return tf_default_data_collator(features)

/usr/local/lib/python3.7/dist-packages/transformers/data/ in torch_default_data_collator(features)
    103         if k not in ("label", "label_ids") and v is not None and not isinstance(v, str):
    104             if isinstance(v, torch.Tensor):
--> 105                 batch[k] = torch.stack([f[k] for f in features])
    106             else:
    107                 batch[k] = torch.tensor([f[k] for f in features])

RuntimeError: stack expects each tensor to be equal size, but got [1] at entry 0 and [8] at entry 2
type or paste code here