Improving zero-shot classification for roughly tokenized labels

I’m trying to classify Yelp-like reviews based on a personalized set of topics that can change over time. Zero-shot classification seems to be a good way to approach this task, and I’ve started working with bart-large-mnli, the recommended model for this domain.

However, as I’ve gotten a feel for the performance of bart-large-mnli, I’ve realized that it struggles with roughly/highly tokenized topics. For example, when trying to classify reviews as related to “covid”, which is tokenized as [‘cov’, ‘id’] by bart-large’s tokenizer, I get many false positives, such as reviews that focus on “collaboration,” “cooperation”, “competition”, or other words with similar beginnings.

I understand that this is a general weakness of tokenization in NLP; however, it is especially pronounced in this form of zero-shot classification and thus can’t be ignored.

I wanted to get the community’s opinion on how to address this issue. Which of the following high-level options would you choose?

  1. Extend the vocabulary of bart-large for very popular topics (like the aforementioned ‘covid’) and conduct additional pretraining for the new embeddings
  2. Choose another model with a larger vocabulary of tokens
  3. Some other approach

I greatly appreciate any kind of feedback you can provide.