@cakiki conceptually, I wonder if training my own language model and then fine-tune it for text-classification will work better than fine-tuning the same old distilbert model that everybody is using. The corpus I am working on is highly specialized (say, medicine for instance) so a dedicated language model makes sense.
I think it would depend on how much (and how different) specialized data you have (perhaps compare that to the size of the dataset the model was initially pretrained on). If it’s a considerable amount, it might make sense to continue pre-training from the checkpoint of the model you’re interested in.