longformer speed compared to bert model

gkim89 · March 28, 2021, 4:44pm

We are trying to use a LongFormer and Bert model for multi-label classification of different documents.

When we use the BERT model (BertForSequenceClassification) with max length 512 (batch size 8) each epoch takes approximately 30 minutes.

When we use LongFormer (LongformerForSequenceClassification with the ‘allenai/longformer-base-4096’ and gradient_checkpointing=True) with max length 4096 (batch size 1, Gradient Accumulation step 8) each epoch takes approximately 12 hours.

Is this reasonable or are we missing something?
Is there anything that we can try to make the training faster?

Xinnuo · May 4, 2021, 5:36am

I was using LED and found it’s also roughly 10 times slower than Bart model.

Topic		Replies	Views
Longformer for sequenceclassification 🤗Transformers	5	474	October 13, 2020
Using Longformer with full attention for comparison Beginners	3	1477	November 18, 2022
Where to run longformer sequence classification and best params Models	0	408	February 13, 2022
What happened when Longformer is trained on dataset longer than 4096? 🤗Transformers	0	291	June 29, 2021
Pipeline very slow 🤗Transformers	1	4348	May 5, 2023

longformer speed compared to bert model

Related topics