Concept drift in pre-trained models


I have a high-level NLP question (I apologize in advance if this is not the appropriate forum - feel free to delete if so).

I’m wondering how to think of the drift in language usage over time, and language models such as those on the huggingface hub. For example, bert-base-uncased was originally trained in 2018. Back then, COVID wasn’t a word, and “huggingface” meant an emoji.

  • Does that mean that running inference on texts using that model will be lacking for these specific topics?
  • If I am fine-tuning a model, might I want to use as my base a model that is newer, even if it performs “worse” on some original task?
  • Is there some way to see in the model card the training date, or sort by date trained (which is likely different than date uploaded to the hub)?