I am currently fine-tuning a Wav2Vec2 model and I would like to know how I could modify training samples.
In Tensorflow it’s possible to do this by calling dataset.map() but doing this with a datasets.arrow_dataset.Dataset will write a cache file. Besides that, it’s static and unnecessary.
There is the data_collator and one could do it here. However, this is very likely not very efficient and I would have to take care of parallelization myself.
I’m looking for a similar way to process input examples like Dataset.map().