How to preserve Html when processing(paraphrasing)

BrianSerp · November 20, 2021, 10:40am

Is there a way to preserve the html when using these the accelerated interference. I am trying to put strings split on html elements through PEGASUS paraphraser. Problem is that the html is removed in the output. How would you go about preserving the html.

anuragshas · November 21, 2021, 5:01am

Parse the html, while keeping all the indexes of each part and then make prediction. After getting model output place them back at those index. This won’t work without valid tags though

BrianSerp · November 21, 2021, 5:28am

Hey thanks for the reply. Could you elaborate on this? The document I’m trying to process is scraped and it just contains

etc… Can you send a me a little more info so i could google and figure out how to do this. I’m trying to put whole articles through pegasus paraphrases while keeping the tags.

anuragshas · November 21, 2021, 7:31am

Beautiful soup would be good one, extract the text from tags like div or p while keeping the index of their positions in the text.

Topic		Replies	Views
Paraphrasing to create unique text Beginners	0	370	December 4, 2021
Pegasus for text paraphrasing 🤗Transformers	0	322	May 26, 2021
Simple Model to rewrite/paraphrase Beginners	7	261	March 19, 2025
Using Pegasus for Paraphrasing Beginners	0	496	January 7, 2022
Out of index error when using pre-trained Pegasus model Intermediate	2	1988	April 1, 2021

How to preserve Html when processing(paraphrasing)

etc… Can you send a me a little more info so i could google and figure out how to do this. I’m trying to put whole articles through pegasus paraphrases while keeping the tags.

Related topics