Why do I get 'Ġ' when adding emojis to the tokenizer?


I have added custom tokens to my tokenizer, which are emojis. This is the code I have used, which adds the new tokens:

model = AutoModelForMaskedLM.from_pretrained(model_checkpoint)

num_added_toks = tokenizer.add_tokens(['👏'])
print('We have added', num_added_toks, 'tokens')
model.resize_token_embeddings(len(tokenizer))  # Notice: resize_token_embeddings expect to receive the full size of the new vocabulary, i.e. the length of the tokenizer


We have added 1 tokens
Embedding(50270, 768)

Though, when I try to tokenize a phrase using this code:

print(tokenizer.tokenize('Congrats 👏'))

I get this output with that strange 'Ġ' symbol:

['Cong', 'rats', 'Ġ', '👏']