Okay, I figured it out.
In my case it originated from passing the class Wav2Vec2FeatureExtractor
to DataCollatorForWav2Vec2Pretraining
instead of an instance of that class.
Make sure the feature extractor is initialized before passing it to the data collator.