Convert huggingface tokenizer into sentencepiece format

I have a huggingface tokenizer for the BERT model (google-bert/bert-base-cased) which includes three files: tokenizer.json, tokenizer_config.json, and vocab.txt. I would like to convert this tokenizer into the SentencePiece tokenizer format, which uses a single .model file.
How can I perform this conversion?

3 Likes

Similar problem here. I would like to convert smollm2-360m hugging face tokenizer to sentencepiece format but couldn’t find any way of doing so. Can anyone guide?