I would like to run wav2vec2 (selfsupervised) pretraining on custom wav files. Is there an easy way to do this? I’m aware of the run_wav2vec2_pretraining_no_trainer script, but not sure how to pass (or create) the custom dataset.
Thanks a lot for your reply, @mfox I assume that audio_paths here is not a string but something like {"audio": [list of wav_paths]} is that right?
I created and saved the sataset as you suggested, but am still getting an error
ValueError: You are trying to load a dataset that was saved using save_to_disk. Please use load_from_disk instead.
It seems that I could change this line of the script to load_from_disk – and also edit the definition of train and validation split somehow – is it the recommended solution?