Byte Level Tokenizer While Training

Hi, I have trained the tokenizer using the model BPE and pre tokenizer as ByteLevel tokenizer = Tokenizer(models.BPE(unk_token="[UNK]")) tokenizer.pre_tokenizer = pre_tokenizers.ByteLevel() tokenizer.decoder = decoders.ByteLevel()
Now, my vocabulary is saved in bytes and tokenizer.tokenize give me output in bytes too which is obvious.
tokenizer.tokenize output is

['Ġन',
 'à¥ĩ',
 'प',
 'ा',
 'ल',
 'à¥Ģ',
 'Ġà¤Ń',
 'ा',
 'ष',
 'ा',
 'म',
 'ा',
 'Ġय',
 'à¥ĭ',
 'Ġà¤ıà¤ķ',
 'Ġà¤īद',
 'ा',
 'हरण',
 'Ġह',
 'à¥ĭ।']

. Is there way to save my vocabulary in unicode character rather than bytes and show tokens in unicode characters too?

1 Like