Create entirely new vocabulary for tokenizer

I am building an encoder-decoder model based off facebook-bart-base for the purpose of solving math problems. I would like to train the model so that the decoder can only output a small set of words (e.g. “multiply”, “divide”, “add”, “subtract”, etc.) and numbers. Is it possible to completely redefine the corpus used by the decoder tokenizer, rather than just adding new tokens to it?