AttributeError: 'str' object has no attribute 'dtype' when pretraining wav2vec2

I was trying to pretrain the word2vec2.0 using this file. However, I get the following error when I reach the training phase:

AttributeError                            Traceback (most recent call last)
<ipython-input-38-9c63e3c0d6e0> in <module>()
      5 for epoch in range(starting_epoch, num_train_epochs):
      6     model.train()
----> 7     for step, batch in enumerate(train_dataloader):
      8         # compute num of losses
      9         num_losses = batch["mask_time_indices"].sum()

5 frames
/usr/local/lib/python3.7/dist-packages/accelerate/data_loader.py in __iter__(self)
    328         # We iterate one batch ahead to check when we are at the end
    329         try:
--> 330             current_batch = next(dataloader_iter)
    331         except StopIteration:
    332             yield

/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in __next__(self)
    433         if self._sampler_iter is None:
    434             self._reset()
--> 435         data = self._next_data()
    436         self._num_yielded += 1
    437         if self._dataset_kind == _DatasetKind.Iterable and \

/usr/local/lib/python3.7/dist-packages/torch/utils/data/dataloader.py in _next_data(self)
    473     def _next_data(self):
    474         index = self._next_index()  # may raise StopIteration
--> 475         data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
    476         if self._pin_memory:
    477             data = _utils.pin_memory.pin_memory(data)

/usr/local/lib/python3.7/dist-packages/torch/utils/data/_utils/fetch.py in fetch(self, possibly_batched_index)
     45         else:
     46             data = self.dataset[possibly_batched_index]
---> 47         return self.collate_fn(data)

<ipython-input-5-e1d5eabaa1e8> in __call__(self, features)
     38             padding=self.padding,
     39             pad_to_multiple_of=self.pad_to_multiple_of,
---> 40             return_tensors="pt",
     41         )
     42 

/usr/local/lib/python3.7/dist-packages/transformers/feature_extraction_sequence_utils.py in pad(self, processed_features, padding, max_length, truncation, pad_to_multiple_of, return_attention_mask, return_tensors)
    219                 if key not in batch_outputs:
    220                     batch_outputs[key] = []
--> 221                 if value.dtype is np.dtype(np.float64):
    222                     value = value.astype(np.float32)
    223                 batch_outputs[key].append(value)

AttributeError: 'str' object has no attribute 'dtype'

I am running the code from run_wav2vec2_pretraining_no_trainer.py in a jupyter notebook on colab for testing purpose. When I tried printing the key and value mentioned in the above code and got key = Path, and value=\path\to\mp3\file\in\dataset for the mp3 file in my custom dataset.

Does anyone know whats going on here? @patrickvonplaten