How do you use SentencePiece for BPE of sequences with no whitespace

I am trying to use byte pair encoding on amino acid sequences which have no spaces:


the tokenizers summary section of the docs states suggests SentencePiece could be useful, as it treats the input as a raw stream, includes the space in the set of characters to use, then uses BPE or unigram to construct the appropriate vocabulary.

How would I train a tokenizer from scratch using SentencePiece? The tokenizer library seems to only support