From my experience and understanding in most cases different models will use different tokenisers. The tokeniser splits up and formats the input into the format the model is expecting the data to be in. If you try and use a different tokeniser to the model’s one, it may throw an error because now the data isn’t constructed the way it expects.
In the papers I’ve read, when they evaluate the performance of different models, the model’s tokenisers are included as part of this evaluation. Basically, they’re treated as part of the model. So, you can compare the performance of two models on the same dataset, using their own specific tokenisers - that is standard ML practice.
This stack overflow question might be of further help: https://stackoverflow.com/questions/72625528/translation-between-different-tokenizers