What Data Should i Validate my Model Against while Training?

unknownTransformer · April 27, 2021, 12:17pm

Hello Guys,

i want to research the impact of diffrent Data-Augmentation-Techniques for STT-Models. For this im Using a Pretrained Wav2Vec2 Model trained on CommonVoice. If i want to test the impact of augmenting the Data by 1) adding a completely new dataset 2) adverserial attack 3) maybe trying TTS-synthesized data

What should i validate the Modell against while Training? For my final comparison i want to use the Test-Split of CommonVoice, but can i just reuse that Dataset for my Validation Step (while Training) or should i use a split what ever im training on?

Since English is not my native language, I’ll try to explain my problem via a graphic.:

test

I am using a pre-trained Wav2Vec2 model and would like to train it further with “Augmented Data”. So for example by taking either a complete new dataset, by TTS syntesized data or by Adverserial Attack based on CommonVoice training data.

I will compare the augmentation methods later using that of the Common Voice -test split. Now, when I train my models, what should I validate them on during training? Should I just split the new (augmented) training data into 80% training 20% validation, or should I take 100% of the augmented data and validate on Common Voice - test split? Is it problematic during training and final comparison to validate the systems with the same dataset-split (as seen in the graph (see dataset above right)

ty in advcaned

Topic		Replies	Views
Creating dataset for costum pretraining speech recognition Models	0	273	September 27, 2022
Can Wav2Vec2 distinguish music during speech-to-text? Models	1	348	August 27, 2023
Predicting only " " after training (S2T) Wav2Vec2CTC Beginners	2	573	February 26, 2022
Wav2Vec2: loss growing in training and validation after few epochs Models	6	2042	September 25, 2024
Wav2Vec2: fix growing training and validation loss after few epochs Models	5	2240	January 27, 2022

What Data Should i Validate my Model Against while Training?

Related topics