NLP Training data

hi, I have around 100k samples of emails which are unlabelled. I want to fine tune a model on this data. What strategies are available to identify the most relevant / different samples to label. ?

Colin