The book reads: “Long articles pose a challenge to most transformer models since the context size is usually limited to 1,000 tokens or so, which is equivalent to a few paragraphs of text. The standard, yet crude way to deal with this for summarization is to simply truncate the texts beyond the model’s context size. Obviously there could be important information for the summary toward the end of the text, but for now we need to live with this limitation of the model architectures”.
I’m curious, what’s the current state of the art for summarizing paragraphs of text that go beyond the context size?