How to use tokenizer.tokenize in Chinese data properly?

When I add a Chinese token to the tokenizer, it can’t tokenize properly. How should I fix it?

tokenizer = AutoTokenizer.from_pretrained(‘bert-base-chinese’)
text = [‘cov19’, ‘病毒’]
tokenizer.tokenize(text) # [‘co’, ‘##v’, ‘##19’, ‘病’, ‘毒’]
tokenizer.add_tokens(text)
tokenizer.tokenize(text) # [‘cov19’, ‘病’, ‘毒’]

I will be grateful for any help you can provide.

1 Like