Two approaches to training a tokenizer

I see these two approaches for training a tokenizer in HuggingFace:

Approach 1

Ref: How to train a new language model from scratch using Transformers and Tokenizers

from tokenizers.implementations import ByteLevelBPETokenizer

tokenizer = ByteLevelBPETokenizer()
paths = ['wikitext-2.txt']

encoding = tokenizer.encode('hello world')
print (encoding.ids)

Approach 2

Ref: Building a tokenizer, block by block - Hugging Face Course

from tokenizers import models, trainers, Tokenizer

tokenizer = Tokenizer(model=models.WordPiece(unk_token="[UNK]"))

special_tokens = ["[UNK]", "[PAD]", "[CLS]", "[SEP]", "[MASK]"]
trainer = trainers.WordPieceTrainer(vocab_size=25000, special_tokens=special_tokens)

tokenizer.train(["wikitext-2.txt"], trainer=trainer)

encoding = tokenizer.encode("Let's test this tokenizer...", "on a pair of sentences.")

My question is what is the difference between two approaches and when should I use which approach?

If I understand correctly, in the latter approach, a model represents the tokenization algorithm. In that case, what does the trainer do? Does it represent the vocabulary and add new tokens to it?

Also, in approach 1, the tokenizer implicitly contains the model (the .model attribute). But it’s train method does not contain an argument for Trainer. Why?