How to choose a base model while fine tuning

Hello. I am new to NLP. I am researching how huggingface can best be utilized for a use case in my organization. Use case is - support ticket categorization. This seems like a text-classification task. I am planning to fine tune a base model based on the organization’s dataset.

My question is - How to choose the best base model that suits my purpose. I have seen in the documentation that, although the task is text-classification, but fill-mask model is used. Here the task in text-classification but distilbert-base-uncased model is used which is fill-mask type.

So, is it okay if any NLP base model is used for training? How does that work?