Bart input confusion

In Bart for the summarization task, the input length is 1024 (1024 token), what does this input represents: (for example i have a document with s1, s2, s500) does this mean that we feed a sequence as sentence per sentence or the whole document must be as input as 1024 token ( all sentences must fit with truncation ) if this true doesn’t it cause information loss ?
And if it’s sequence per sequence, let’s say 20 sentences at a time of 500, will the output at the top encoder change each time?
To be honest I’m having a difficult time imagining how the encoder is processing the document.

Hi @Hildweig

For BART max input length is 1024 tokens. You can think of a token as as a word for simplicity (words can be split in multiple tokens as well). It’s not sentences.

Here document means a seq with max 1024 tokens. Processing longer sequences than that is still a topic of ongoing research.

And I would recommend to read the original Transformers paper (Attention is all you need ) to get an idea about how a sequence is processed by the encoder. Or the illustrated transformers

So does it get truncated? If yes is there a special method they use for truncation?