Collapsing Wav2Vec2 pretraining loss

Ollie · September 19, 2021, 6:08am

I’m trying to pretrain a Wav2Vec2 model based on the example given here`.

I was initially getting a contrastive loss like the graph on the left which seemed very slow so I upped the learning rate and got the graph on the right after only a few steps.

I’m not familiar with the nuts and bolts of contrastive loss but this came as a bit of a surprise and I was wondering if anyone could help me understand.

The batch size (with accumulation) is 32, the number of epochs is 20 and the warmup steps is 1200 for both attempts.

Ollie · September 20, 2021, 10:57am

The solution in the end was to set return_attention_mask to True in the feature extractor, or use a pretrained feature extractor and model that prefers attention masks (i.e. not wav2vec2-base).

morenolq · April 3, 2023, 11:56am

Hi, thank you for sharing your experience and solution. Concerning the attention mask, do you input to the model directly the attention mask given by the feature extractor (at sample level and not at frame level right?) ? Are you able to check the shapes?

Thanks

Topic		Replies	Views
Pre-training loss Wav2Vec2 Conformer Models	0	310	March 13, 2023
Wav2Vec2: loss growing in training and validation after few epochs Models	6	2042	September 25, 2024
Wav2Vec2: fix growing training and validation loss after few epochs Models	5	2240	January 27, 2022
Why is Wav2Vec pretraining loss not decreasing? Models	10	2639	April 29, 2022
Pre-training for Wav2Vec2-XLSR via Huggingface Models	15	5340	November 5, 2024

Collapsing Wav2Vec2 pretraining loss

Related topics