Hey!
I’m trying to finetune the W2v2 model on my own dataset (100K audio) merged with the French Common Voice dataset (total of 89Go) but the preprocess is killed after 44%. I tried to remove the lengthiest audio and it works (<=3s) but what I am not understanding is that I have 128G of RAM it should be enough no?
Can I load data on the fly while batches loading? instead of loading all the dataset before launching the training phase?
Thank’s for your reply