Longformer for text summarization

Hello! :slight_smile: Does anyone know how to summarize long documents/news articles using the Longformer library? I am aware that using T5, the token limit is 512.

I would really appreciate any help in this area! Thank you :slight_smile:

1 Like

Hi, it’s possible to use Longformer for summerization, the way its done now, is taking BART model and then replacing it’s self attention with longformer sliding window attention so that it can take longer sequences. Check this two issues, first, second, and this branch of longformer repo

3 Likes

Hi, I followed the example from the branch of longformer repo, but it seems that the final output is a tensor instead of words/text. How can I convert it into words?

Note that, these models are not yet fine-tuned for long summarization, you’ll need to fine-tune them yourself or wait till someone does that. And yes, the model returns a tensor, to generate text you’ll need to use the generate method.

Here’s nice blog post on the generate method

2 Likes

the links are broken :confused:

Hey @valhalla. Hope you’re well. In your earlier comment here you mention that Longformer for summarisation takes the BART model and replaces it’s self attention. I was under the impression that this model was based off ROBERTA. Can you confirm if there is a long former model based off BART and if so where it is on the hub?

Hi,

LongFormer is an encoder-only Transformer (similar to BERT/RoBERTa), it only has a different attention mechanism, allowing it to be used on longer sequences.

The author also released LED (LongFormer Encoder Decoder), which is a seq2seq model (like BART, T5) but with LongFormer as encoder, hence allowing it to be used to summarize long documents for instance, or translate long texts.

Weights are on the hub: Models - Hugging Face

Amazing that’s very helpful thank you. I can see on that link that Allen AI’s LED model is based of bart-base which is ideal. If I were to look to try and convert bart-large to a LED would this notebook still be the right approach or is this for encoder-only models?

I don’t have access to this notebook.

okay no problem. Thanks for your help!