Aggregate encoder states in encoder-decoder models for long sequences?

marksverdhei · April 8, 2022, 6:39pm

Hi. I would like to train a text-to-text QA model for long documents.
I was wondering if anyone has seen success in aggregating the encoder states of a long document in any way (e.g. pooling) before passing it to the decoder, similar to the sliding window technique done for e.g. classification with BERT. I’m well aware of models like the longformer, etc, but just wondering if this approach has any utility, and if not, why not?

Topic		Replies	Views
Longformer for text summarization Beginners	10	5257	August 6, 2022
Sliding Transformer into a long sequence Models	3	664	August 20, 2022
Make available BERT like models work on longer sequences (flash attention) Models	0	834	September 27, 2023
Finetuning transformers for long document summarisation Beginners	0	341	October 25, 2022
Summarization taks, looking for clarifications before getting started Beginners	10	973	February 16, 2021

Aggregate encoder states in encoder-decoder models for long sequences?

Related topics