ValueError: `mask_length` has to be smaller than `sequence_length`, but got `mask_length`: 10 and `sequence_length`: 4` when finetuning wav2vec2.0

I’m getting a value error when i run trainer.train()

for the training, I’m following this notebook https://www.kaggle.com/code/ajax0564/wave2vec2-0-fine-tuning-english
I’m downloaded the notebook and using it on dataspell since Google collab giving me troubles when i load my dataset.

and I’m using this modle to fine tune wav2vec2-base-mine/ . as datasets I use my own custom made audio dataset and that contains .wav files and their texts in a csv I have done all the pre processing. I’m wondering if there are any solutions for this since i’m struggling in this step for a while now, perhaps if there is a way to edit mask_length or something but I don’t see it.

The traceback is below,

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
~\AppData\Local\Temp\ipykernel_14780\49973641.py in <module>
----> 1 trainer.train()

~\anaconda3\lib\site-packages\transformers\trainer.py in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1541             self._inner_training_loop, self._train_batch_size, args.auto_find_batch_size
   1542         )
-> 1543         return inner_training_loop(
   1544             args=args,
   1545             resume_from_checkpoint=resume_from_checkpoint,

~\anaconda3\lib\site-packages\transformers\trainer.py in _inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   1789                         tr_loss_step = self.training_step(model, inputs)
   1790                 else:
-> 1791                     tr_loss_step = self.training_step(model, inputs)
   1792 
   1793                 if (

~\anaconda3\lib\site-packages\transformers\trainer.py in training_step(self, model, inputs)
   2537 
   2538         with self.compute_loss_context_manager():
-> 2539             loss = self.compute_loss(model, inputs)
   2540 
   2541         if self.args.n_gpu > 1:

~\anaconda3\lib\site-packages\transformers\trainer.py in compute_loss(self, model, inputs, return_outputs)
   2569         else:
   2570             labels = None
-> 2571         outputs = model(**inputs)
   2572         # Save past state if it exists
   2573         # TODO: this needs to be fixed and made cleaner later.

~\anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
   1192         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194             return forward_call(*input, **kwargs)
   1195         # Do not call functions when jit is used
   1196         full_backward_hooks, non_full_backward_hooks = [], []

~\anaconda3\lib\site-packages\transformers\models\wav2vec2\modeling_wav2vec2.py in forward(self, input_values, attention_mask, output_attentions, output_hidden_states, return_dict, labels)
   1678         return_dict = return_dict if return_dict is not None else self.config.use_return_dict
   1679 
-> 1680         outputs = self.wav2vec2(
   1681             input_values,
   1682             attention_mask=attention_mask,

~\anaconda3\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
   1192         if not (self._backward_hooks or self._forward_hooks or self._forward_pre_hooks or _global_backward_hooks
   1193                 or _global_forward_hooks or _global_forward_pre_hooks):
-> 1194             return forward_call(*input, **kwargs)
   1195         # Do not call functions when jit is used
   1196         full_backward_hooks, non_full_backward_hooks = [], []

~\anaconda3\lib\site-packages\transformers\models\wav2vec2\modeling_wav2vec2.py in forward(self, input_values, attention_mask, mask_time_indices, output_attentions, output_hidden_states, return_dict)
   1309 
   1310         hidden_states, extract_features = self.feature_projection(extract_features)
-> 1311         hidden_states = self._mask_hidden_states(
   1312             hidden_states, mask_time_indices=mask_time_indices, attention_mask=attention_mask
   1313         )

~\anaconda3\lib\site-packages\transformers\models\wav2vec2\modeling_wav2vec2.py in _mask_hidden_states(self, hidden_states, mask_time_indices, attention_mask)
   1252             hidden_states[mask_time_indices] = self.masked_spec_embed.to(hidden_states.dtype)
   1253         elif self.config.mask_time_prob > 0 and self.training:
-> 1254             mask_time_indices = _compute_mask_indices(
   1255                 (batch_size, sequence_length),
   1256                 mask_prob=self.config.mask_time_prob,

~\anaconda3\lib\site-packages\transformers\models\wav2vec2\modeling_wav2vec2.py in _compute_mask_indices(shape, mask_prob, mask_length, attention_mask, min_masks)
    161 
    162     if mask_length > sequence_length:
--> 163         raise ValueError(
    164             f"`mask_length` has to be smaller than `sequence_length`, but got `mask_length`: {mask_length}"
    165             f" and `sequence_length`: {sequence_length}`"

ValueError: `mask_length` has to be smaller than `sequence_length`, but got `mask_length`: 10 and `sequence_length`: 4`

@patrickvonplaten Can you maybe give some thoughts?