AttributeError: 'str' object has no attribute 'dtype' when pretraining wav2vec2

Okay, I figured it out.

In my case it originated from passing the class Wav2Vec2FeatureExtractor to DataCollatorForWav2Vec2Pretraining instead of an instance of that class.

Make sure the feature extractor is initialized before passing it to the data collator.