I think the best method is to go with sentence-bert. Indeed, you can use your own model, just try reproducing what they do in the paper.
You add a pooling layer at the output of your model, from the paper:
We experiment with three pooling strategies: Using the output of the CLS-token,computing the mean of all output vectors (MEAN-strategy), and computing a max-over-time of the output vectors (MAX-strategy). The default configuration is MEAN.
Finally, you might want to fine-tune your model for comparison, too:
In order to fine-tune BERT / RoBERTa (your model), we create siamese and triplet networks (Schroff et al.,2015) to update the weights such that the produced sentence embeddings are semantically meaningful and can be compared with cosine-similarity.
Hope this help!