How to preserve Html when processing(paraphrasing)

Is there a way to preserve the html when using these the accelerated interference. I am trying to put strings split on html elements through PEGASUS paraphraser. Problem is that the html is removed in the output. How would you go about preserving the html.

Parse the html, while keeping all the indexes of each part and then make prediction. After getting model output place them back at those index. This won’t work without valid tags though

Hey thanks for the reply. Could you elaborate on this? The document I’m trying to process is scraped and it just contains

etc… Can you send a me a little more info so i could google and figure out how to do this. I’m trying to put whole articles through pegasus paraphrases while keeping the tags.

Beautiful soup would be good one, extract the text from tags like div or p while keeping the index of their positions in the text.