Summarization taks, looking for clarifications before getting started

Hi @neuralpat, no bother at all :slight_smile:

If I understood your original aim, you’d like to perform summarization right? As far as I know, you won’t be able to use the xlm-r model fine-tuned on token classification since what you really need is a language modelling head to generate the summary.

How long are you documents? Depending on time / cost, I would be tempted to still run an experiment with the encode-decoder approach just to get a feel for how well this baseline performs on the dataset. For example, the CNN / DailyMail dataset has articles that are longer than most Transformer model’s context size, yet the summaries are not so bad.

If length is really an issue, then you might want to check out the LongFormer model: allenai/led-base-16384 · Hugging Face which can process 16k tokens :exploding_head:

There’s also a long thread here with a discussion related to your issue, so you might find some relevant ideas there: Summarization on long documents

What is generally true is that the pretrained checkpoints can be fine-tuned on a variety of downstream tasks via transfer learning - perhaps this is what you had in mind?

1 Like