How to use huge target data without source data

PrakashFDS · May 2, 2022, 7:18am

Hi,

We are working on a summarization problem. The aim of the project is to take a webpage text as input and produce a summary (a sentence) that describes the business of the company.

We have a parallel corpus of about 2000 i.e., input webpage and expected summary. We are finetuning summarization model (sshleifer/distilbart-xsum-12-6) with 2000 samples we have. We are getting decent results using this approach.

Apart from the 2000 parallel corpus. We have a huge set of target summaries (around 500k). How can we use that huge target summaries to improve the summarization model.

Topic		Replies	Views
Which of the sshleifer/* models can be used as-is for text summarization? Beginners	5	459	July 15, 2020
How does summarization work with pretrained models? 🤗Transformers	0	590	November 14, 2023
Dealing with Chunked Input Text and Summaries for Fine Tuning Summarization model Beginners	1	1149	March 13, 2024
How to utilize a summarization model Beginners	4	2401	February 18, 2021
How I fine-tune BART for summarization using large texts? Research	3	3997	September 28, 2020

How to use huge target data without source data

Related topics