How to do domain adaptive pretraining of Pegasus?

I’d like to continue pretraining Pegasus (because see here: [2004.10964] Don't Stop Pretraining: Adapt Language Models to Domains and Tasks), and I’m trying to see if I can do this with Huggingface.

The pretraining objective of Pegasus is gap sentence generation. Hence, I am wondering if I can achieve this as follows:

Given unlabelled documents D=[d_1,d_2,…,d_n], take each d_i and extract m “important” sentences", x_1,…,x_m. Concatenate sentences x_1,…,x_m to obtain s_i, and use s_i as an approximation of a summary for d_i. Remove each x_j from d_i to obtain z_i. Now the goal is to obtain s_i from z_i, to approximate the task of summarising d_i. Then, we have a labelled dataset [(z_1, s_1), (z_2, s_2),…,(z_n, s_n)]. Now perform fine-tuning on this dataset, predicting s_i from z_i.

After this, I would fine-tune using my dataset of documents with real human written summaries.

Would this lead to the domain adaptive pretraining I am seeking, or does this give me something else?