Issues with BPE tokenizer

I attempted to train the BPE Hugging Face tokenizer for Pashto from scratch, but it is not decoding words correctly.

No one can help you with such limited information.

  1. I am unfamiliar with Pashto. Does the official documentation say anything about BPE?
  2. Is it possible that Pashto has already been pre trained with a different tokenizer?
  3. Post your code so we can understand what the software is trying to tell you.
1 Like

I solve the issues. :nauseated_face: