Hey! Not sure I completely understand, but the tokens that you have here are the plain tokens, as they are in the vocab / merge. You should modify the tokenizer if you do not want it to add the spiece token at the beginning. Which tokenizer are you using?
Iâm not sure if this is a hack, but to get the behavior I wanted, I just passed the token ids into decode_batch instead, and that returned each token without the odd encoding.