Can Wav2vec 2.0 pertained using only 3 hours of speech?

Hello all,

Can Wav2vec 2.0 pertained using only 3 hours of speech?

If no, how much data is required for the pertaining process?

Thanks.

Hi, I suggest plotting a learning curve to check if it has flattened out. I found a paper experimenting on the number of instances needed to fine-tune wav2vec for a classification task. I think 3hr is not impossible depending on your data and task. So you can try and check on the performance. But a learning curve can better visualize if you’ll need to acquire more data points.

Thank you @pearlyu Do you a link to the paper you mentioned?