Bart summarization

Good morning/evening
I am trying to understand how does distilbart generate summaries, like what is the logic behind when you fine tune it with texts and their reference summaries, how does it learn to summarize with a specified length with new words? The way I see it is: I feed a text into the model, it gets encoded & then decoded with only the tokens containing important information? How does the model spots kinda the good sentences tokens?

1 Like

You should read more about “Sequence to Sequence”.

Bart is a seq2seq model : the input text is encoded with attention, and then output text is generated token by token, with attention over the input and the generated output so far. Since the output is generated token by token, we can choose how many token we want to generate.


Yes as @colanim said. And learn from the master is one of the best way :slight_smile:
Here’s Andrew Ng Video Series on Seq2Seq model :

Even though it’s not transformer, but the big picture concept is applicable .

1 Like

Thank you very much, I will!

1 Like