Bertsum extractive summarization

In BertSUM paper they say that the summarization happens after the output of bert, they use a stack of specific summarization layers ( two layers of transformers worked best) then add a softmax layer to get which sentence should be included into the summmary. Now in their code there are classes for clustering, what’s the use of these classes/ methods?