How to train with very long sequences?

Hi, I’m trying to fine-tune a classifier on very long sequences that can reach up to 6000000 characters and even more. I started with GPT-2 model, and had to truncate the sequences to the limit of the model, which is 1024 if I understood correctly.
Is there a way for me to train this or a different model on longer sequences?

There is longformer, a transformer created for long sequences. I think in their paper they claim to be able to attend tens of thousands of tokens.

1 Like

thank you!