The tokens you add with add_tokens
are not added directly to the original vocabulary, but instead they are part of a special vocabulary. They end up being handled first, so that what you define manually always has the priority.
As you noticed, if you specify ##committed
in the input text, it will use your token, but not without the ##
. This is simply because they are treated literally, just as you added them.
So, you should be able to achieve what you want by doing:
tokenizer.add_tokens([ 'committed' ])
tokenizer.tokenizer('hellocommitted')
# [ 'hello', 'commited' ]