Modeling long sequences

Hey there,

What’s the recommended way for modeling longer sequences in NLP using BERT/BART type models?

Say on the order of ~10k+ tokens.

Tasks are potentially autoregressive, and won’t have a fixed size output vector.

Thanks!