How to train with very long sequences?

zeev55 · May 18, 2022, 6:52am

Hi, I’m trying to fine-tune a classifier on very long sequences that can reach up to 6000000 characters and even more. I started with GPT-2 model, and had to truncate the sequences to the limit of the model, which is 1024 if I understood correctly.
Is there a way for me to train this or a different model on longer sequences?

mab · May 18, 2022, 8:12pm

There is longformer, a transformer created for long sequences. I think in their paper they claim to be able to attend tens of thousands of tokens.

zeev55 · May 20, 2022, 6:52am

thank you!

Topic		Replies	Views
How to train transformer (seq-to-seq) for very large seq? 🤗Transformers	0	251	October 4, 2021
Sliding Transformer into a long sequence Models	3	665	August 20, 2022
Token Classification Models on (Very) Long Text Models	8	11165	March 9, 2023
Text classification training on long text Intermediate	3	4974	June 18, 2024
What happened when Longformer is trained on dataset longer than 4096? 🤗Transformers	0	291	June 29, 2021

How to train with very long sequences?

Related topics