Tokenizer not recognising words in vocabulary

I have an issue where a tokenizer doesn’t recognise tokens in its own vocabulary. A minimal example is:

from transformers import AutoTokenizer
model_checkpoint = ‘DeepChem/ChemBERTa-77M-MTR’
tokenizer = AutoTokenizer.from_pretrained(model_checkpoint)
test_smiles = ‘CCC1=[O+]’
print(tokenizer.vocab[‘[O+]’])
print(tokenizer.tokenize(test_smiles))

this outputs:

73
[‘C’, ‘C’, ‘C’, ‘1’, ‘=’, ‘O’]

Notice that the '[O+]' expression is encoded simply as 'O', even though it is in the vocabulary. This loses important information.

(also posted here as I’m not sure where exactly the issue is)