Pretrained model recommendations for tokenizing english news?

Hi,

I am aiming to train a model for calculating relatedness scores for daily news content from various sources. However I require a tokenizer model to create related embeddings.

Could you recommend any pre-trained english model as a best fit for this purpose?

Best regards,
Alihan