Hi everyone,
Nice to see you here.
I’m new to the Transformer-XL model. I’m following Fine-tuning with custom datasets to finetune Transformer-XL with Trainer.(sequence classification task)
First, I used exactly the same way as the instruction above except for:
tokenizer = TransfoXLTokenizer.from_pretrained(‘transfo-xl-wt103’)
model = TransfoXLForSequenceClassification.from_pretrained(“transfo-xl-wt103”)
By doing this, I got ‘RuntimeError: stack expects each tensor to be equal size, but got [25] at entry 0 and [24] at entry 1.’ I think the reason for the error is that I should pad the sequences in the same batch to the same length. Let me know, if I’m wrong. Probably, I need a data_collator to solve this problem. Is there a build-in data_collator in huggingface to solve this problem? If not, is there an example about how to overwrite the data_collator?
Second, I changed the code to:
tokenizer = TransfoXLTokenizer.from_pretrained(‘transfo-xl-wt103’)
model = TransfoXLForSequenceClassification.from_pretrained(“transfo-xl-wt103”)
train_texts = [train_text[:120] for train_text in train_texts]
val_texts = [val_text[:120] for val_text in val_texts]
test_texts = [test_text[:120] for test_text in test_texts]
tokenizer.pad_token = tokenizer.eos_token
train_encodings = tokenizer(train_texts, padding=True, max_length=‘120’)
val_encodings = tokenizer(val_texts, padding=True, max_length=‘120’)
test_encodings = tokenizer(test_texts, padding=True, max_length=‘120’)
multilabel_trainer = Trainer(
model=model, # the instantiated Transformers model to be trained
tokenizer=tokenizer,
args=training_args, # training arguments, defined above
train_dataset=train_dataset, # training dataset
eval_dataset=val_dataset # evaluation dataset
)
By doing this, I think I made the sequence in the same batch have the same size. However, I got the error ‘AssertionError: Cannot handle batch sizes > 1 if no padding token is defined.’ I checked my tokenizer:
tokenizer.pad_token return ‘’, tokenizer.pad_token_id return 0.
Sometimes, it will provide my cuda out of memory even though I restarted the gpu and checked the gpu memory before I running the code by using nvidia-smi.
Last, I changed the batchsize to 1, it trained for 11 steps and cuda out of memory. My GPU is P100 with 16 GB memory, I think it shouldn’t be full so quick. (I used the gpu to fine tune bert successfully)
I have no idea where did I do wrong. Any suggestions or help will be appreciated.
For your convenience, I uploaded the notebook here.
Best!