AttributeError: 'str' object has no attribute 'dtype' when pretraining wav2vec2

I was trying to pretrain the word2vec2.0 using this file. However, I get the following error when I reach the training phase:

AttributeError                            Traceback (most recent call last)
<ipython-input-38-9c63e3c0d6e0> in <module>()
      5 for epoch in range(starting_epoch, num_train_epochs):
      6     model.train()
----> 7     for step, batch in enumerate(train_dataloader):
      8         # compute num of losses
      9         num_losses = batch["mask_time_indices"].sum()

5 frames
/usr/local/lib/python3.7/dist-packages/accelerate/ in __iter__(self)
    328         # We iterate one batch ahead to check when we are at the end
    329         try:
--> 330             current_batch = next(dataloader_iter)
    331         except StopIteration:
    332             yield

/usr/local/lib/python3.7/dist-packages/torch/utils/data/ in __next__(self)
    433         if self._sampler_iter is None:
    434             self._reset()
--> 435         data = self._next_data()
    436         self._num_yielded += 1
    437         if self._dataset_kind == _DatasetKind.Iterable and \

/usr/local/lib/python3.7/dist-packages/torch/utils/data/ in _next_data(self)
    473     def _next_data(self):
    474         index = self._next_index()  # may raise StopIteration
--> 475         data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    476         if self._pin_memory:
    477             data = _utils.pin_memory.pin_memory(data)

/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/ in fetch(self, possibly_batched_index)
     45         else:
     46             data = self.dataset[possibly_batched_index]
---> 47         return self.collate_fn(data)

<ipython-input-5-e1d5eabaa1e8> in __call__(self, features)
     38             padding=self.padding,
     39             pad_to_multiple_of=self.pad_to_multiple_of,
---> 40             return_tensors="pt",
     41         )

/usr/local/lib/python3.7/dist-packages/transformers/ in pad(self, processed_features, padding, max_length, truncation, pad_to_multiple_of, return_attention_mask, return_tensors)
    219                 if key not in batch_outputs:
    220                     batch_outputs[key] = []
--> 221                 if value.dtype is np.dtype(np.float64):
    222                     value = value.astype(np.float32)
    223                 batch_outputs[key].append(value)

AttributeError: 'str' object has no attribute 'dtype'

I am running the code from in a jupyter notebook on colab for testing purpose. When I tried printing the key and value mentioned in the above code and got key = Path, and value=\path\to\mp3\file\in\dataset for the mp3 file in my custom dataset.

Does anyone know whats going on here? @patrickvonplaten

1 Like

I am facing the same issue. Did you find a solution/explanation?

Okay, I figured it out.

In my case it originated from passing the class Wav2Vec2FeatureExtractor to DataCollatorForWav2Vec2Pretraining instead of an instance of that class.

Make sure the feature extractor is initialized before passing it to the data collator.

@mpierrau can you please share the solution?

Hi @omar47. I’m not sure we have the same original issue.

I see two alternative issues that may cause this:

  1. Passing the class Wav2Vec2FeatureExtractor to DataCollatorForWav2Vec2Pretraining. Solution: instantiate the feature extractor before passing it to the data collator instance:
    model = Wav2Vec2ForPreTraining(args.model_path)
    feature_extractor = Wav2Vec2FeatureExtractor.from_pretrained(args.model_path)

    data_collator = DataCollatorForWav2Vec2Pretraining(
  1. The dataset is still a dictionary:
    This problem appeared again for me because I did not remove unused columns in the preprocessing step (using prepare_dataset). Because of this the data is still in dictionary form, which I think is not expected by the padding function in the Data Collator. Make sure that you keep the line remove_columns=raw_datasets["train"].column_names when mapping the prepare_dataset function to your dataset:
vectorized_datasets =

These are the two things that I could identify that alleviated the issue in my case. Good luck!

Okay, That did resolve that error. Now I am seeing the following error:

Traceback (most recent call last):
  File "/usr/local/bin/accelerate", line 8, in <module>
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/", line 43, in main
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/", line 910, in launch_command
  File "/usr/local/lib/python3.7/dist-packages/accelerate/commands/", line 397, in simple_launcher
    process = subprocess.Popen(cmd, env=current_env)
  File "/usr/lib/python3.7/", line 800, in __init__
    restore_signals, start_new_session)
  File "/usr/lib/python3.7/", line 1462, in _execute_child
    env_list.append(k + b'=' + os.fsencode(v))
  File "/usr/lib/python3.7/", line 812, in fsencode
    filename = fspath(filename)  # Does type-checking of `filename`.
TypeError: expected str, bytes or os.PathLike object, not NoneType

I tried running the code by copy pasting scripts into a colab notebook. I was unable to access the values in train_dataloader. It gives the same error when I try to convert train_dataloader to a list. So maybe this didn’t resolve the error. I had edited the code to use model and feature_extractor from pretrained.

Hey @omar47,

Glad to see my previous reply was of help.

I haven’t seen the error you post above before, but a quick google gives me two other similar issues where Google Colab with a new accelerate update may be the issue: this one and this one.

For the firstone, the solution was to downgrade accelerate to version 0.12.0 (pip install accelerate==0.12.0). If you try this, make sure create a new virtual environment before downgrading, as I am unaware of if other packages in Huggingface are dependent on accelerate > 0.12.0.