Questions on model's tokens

Hi, I have a few questions regarding tokenizing word/characters/emojis for different huggingface models.

From my understanding, a model would only perform best during inference if the token of the input sentence are within the tokens that the model’s tokenizer was trained on.

My questions are:

  1. is there a way to easily find out if a particular word/emoji is compatible (included during model training) with the model?

  2. if this word/emoji is not was not included during model training, what are the best ways to deal with these words/emojis, such that model inference would give best possible output considering the inclusion of these word/emoji as input. (for 2. it would be nice if it could be answered in the context of my setup below, if possible)

My current setup is as follows:

from transformers import pipeline, AutoTokenizer, AutoModelForSequenceClassification
pre_trained_model = 'facebook/bart-large-mnli'
task = 'zero-shot-classification'
candidate_labels = ['happy', 'sad', 'angry', 'confused']
tokenizer = AutoTokenizer.from_pretrained(pre_trained_model)
model = AutoModelForSequenceClassification.from_pretrained(pre_trained_model)
zero_shot_classifier = pipeline(model=model, tokenizer=tokenizer, task=task)

zero_shot_classifier('today is a good day 😃', candidate_labels=candidate_labels)

Any help is appreciated :hugs: