Fine-tuning BERT with sequences longer than 512 tokens

arteagac · February 28, 2022, 7:57pm

In my experience, LongFormer and BigBird require a lot of GPU memory. I tried using these on a 14GB GPU, but I was limited to batch_size=1, which took for ever to train and yielded rather poor results.

Topic		Replies	Views
How to load a BERT model with 1024 dimensions Beginners	0	2874	June 9, 2021
Trying to process longer documents with BERT-based models Intermediate	0	619	March 8, 2021
Is there a pre-trained BERT model with the sequence length 2048? 🤗Transformers	2	2091	November 5, 2020
Positional Encoding error, Protein Bert Model Intermediate	2	652	October 25, 2020
Increasing pretrained CLIP max possible text sequence length 🤗Transformers	2	1527	November 7, 2024

Fine-tuning BERT with sequences longer than 512 tokens

Related topics