Help defining tokenizer

Im just curious- i want to train multiple models on the same dataloader, kinda similar to vision- is there any way to train a new tokenizer that’s not specific to a model, such that I can run the following workflow?


dataset = load_dataset('wiki')
model1 = AutoModelForMaskedLM.from_pretrained("model1name")
model2 = AutoModelForMaskedLM.from_pretrained("model2name")

### help me with code to tokenize the dataset here
tokenizer = (...) # I would have done a from_pretained here, but am not sure what to do, since model1 and model2 might have different tokenizers

def tokenize_function(examples):
    return tokenizer.encode(examples["text"], padding="max_length", truncation=True)

dataset = dataset.map(tokenize_function, batched=True)

dataloader = Dataloader(dataset)
####

for x in dataloader: 
  y1 = model1(x) 
  y2 = model2(x)