What does Wav2Vec2Tokenizer do?and what is the difference between it and Wav2Vec2FeatureExtractor?

I know my question seems basic but I am talking specifically in audio data. I know what tokenization is in text data which dividing the text into tokens(word, characters or subwords).
I was checking the hugging face documentation wav2vec but I did not understand the tokenization in the context of audio.
I also used Wav2Vec2FeatureExtractor which normalizes the data and I found out that its output is the same as Wav2Vec2Tokenizer’s output
For example:

from transformers import Wav2Vec2Tokenizer, Wav2Vec2Model
from datasets import load_dataset
import soundfile as sf

tokenizer = Wav2Vec2Tokenizer.from_pretrained("facebook/wav2vec2-base-960h")
model = Wav2Vec2Model.from_pretrained("facebook/wav2vec2-base-960h")

def map_to_array(batch):
    speech, _ = sf.read(batch["file"])
    batch["speech"] = speech
    return batch

ds = load_dataset("patrickvonplaten/librispeech_asr_dummy", "clean", split="validation")
ds = ds.map(map_to_array)

input_values = tokenizer(ds["speech"][0], return_tensors="pt").input_values  # Batch size 1
hidden_states = model(input_values).last_hidden_state

feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained("facebook/wav2vec2-base-960h")
i= feature_extractor(ds["speech"][0], return_tensors="pt", sampling_rate=16000)

i equals input_values in this example.
What is the difference between Wav2Vec2FeatureExtractor and Wav2Vec2Tokenizer?