Questions when doing Transformer-XL Finetune with Trainer

huwendeng · April 3, 2021, 8:25am

Hi everyone,

Nice to see you here.

I’m new to the Transformer-XL model. I’m following Fine-tuning with custom datasets to finetune Transformer-XL with Trainer.(sequence classification task)

First, I used exactly the same way as the instruction above except for:

tokenizer = TransfoXLTokenizer.from_pretrained(‘transfo-xl-wt103’)
model = TransfoXLForSequenceClassification.from_pretrained(“transfo-xl-wt103”)

By doing this, I got ‘RuntimeError: stack expects each tensor to be equal size, but got [25] at entry 0 and [24] at entry 1.’ I think the reason for the error is that I should pad the sequences in the same batch to the same length. Let me know, if I’m wrong. Probably, I need a data_collator to solve this problem. Is there a build-in data_collator in huggingface to solve this problem? If not, is there an example about how to overwrite the data_collator?

Second, I changed the code to:

tokenizer = TransfoXLTokenizer.from_pretrained(‘transfo-xl-wt103’)
model = TransfoXLForSequenceClassification.from_pretrained(“transfo-xl-wt103”)

train_texts = [train_text[:120] for train_text in train_texts]
val_texts = [val_text[:120] for val_text in val_texts]
test_texts = [test_text[:120] for test_text in test_texts]

tokenizer.pad_token = tokenizer.eos_token

train_encodings = tokenizer(train_texts, padding=True, max_length=‘120’)
val_encodings = tokenizer(val_texts, padding=True, max_length=‘120’)
test_encodings = tokenizer(test_texts, padding=True, max_length=‘120’)

multilabel_trainer = Trainer(
model=model, # the instantiated Transformers model to be trained
tokenizer=tokenizer,
args=training_args, # training arguments, defined above
train_dataset=train_dataset, # training dataset
eval_dataset=val_dataset # evaluation dataset
)

By doing this, I think I made the sequence in the same batch have the same size. However, I got the error ‘AssertionError: Cannot handle batch sizes > 1 if no padding token is defined.’ I checked my tokenizer:

tokenizer.pad_token return ‘’, tokenizer.pad_token_id return 0.

Sometimes, it will provide my cuda out of memory even though I restarted the gpu and checked the gpu memory before I running the code by using nvidia-smi.

Last, I changed the batchsize to 1, it trained for 11 steps and cuda out of memory. My GPU is P100 with 16 GB memory, I think it shouldn’t be full so quick. (I used the gpu to fine tune bert successfully)

I have no idea where did I do wrong. Any suggestions or help will be appreciated.

For your convenience, I uploaded the notebook here.

Best!

sgugger · April 5, 2021, 1:31pm

Note that TransformerXL is the only model of the library that does not work with Trainer as the loss it returns is not reduced (it’s an array and not a scalar). You might get away with it by implementing your own subclass of Trainer and override the compute_loss function to convert that array to a scalar.

huwendeng · April 6, 2021, 3:28am

Thanks for letting me know this! That’s really helpful. Or I will keep working on figuring out why Trainer is not working with Transformer-XL.
I will try to rewrite the compute_loss function for it.

Gianluca · October 6, 2021, 5:02am

Did you figure out how to use Transformer-XL for sequence classification? @huwendeng

Topic		Replies	Views
How to use Transformer XL for sequence classification? 🤗Transformers	2	605	October 6, 2021
KeyError: 'loss' even after appending labels while Fine Tuning Transformer XL Beginners	2	802	May 10, 2021
Training Transformer XL from scratch Beginners	0	902	May 22, 2021
Data shape needed for training TransformerXL from scratch Beginners	2	341	January 12, 2021
Trainer crashes during predict and with compute_metrics Beginners	4	2288	April 13, 2021

Questions when doing Transformer-XL Finetune with Trainer

Related topics