I am working with the mistralai/Mistral-7B-v0.1
model. I loaded the tokenizer via:
tokenizer = transformers.AutoTokenizer.from_pretrained('mistralai/Mistral-7B-v0.1')
then ran the following code:
tokenizer.decode([69])
Produces output B
.
tokenizer.decode([198])
Produces output �
. This is perhaps understandable since token ID 198 corresponds to token <0xC3>
, which is the hex ASCII code for Ã
and may be unprint-able.
But then:
tokenizer.decode([69,198])
Produces output ��
. I don’t know why it’s producing this instead of B�
.
Any help will be appreciated!