Wav2Vec2: fix growing training and validation loss after few epochs

Hi,

I’m using Wav2Vec2ForCTC.from_pretrained(“facebook/wav2vec2-large-xlsr-53”) to fine-tune on the Lithuanian language dataset. I’ve limited the dataset to 100 hours of records in the range of 1 to 15 seconds.
I’m following the example from this notebook: Fine-Tune Wav2Vec2 for English ASR in Hugging Face with :hugs: Transformers by @patrickvonplaten.

My issue is that that the training loss and validation loss steadily decrease first few epochs and then all metrics start to worsen.

Eval loss

Wer

Train loss (although not so clearly visible)

My configuration is:

model = Wav2Vec2ForCTC.from_pretrained(
    "facebook/wav2vec2-large-xlsr-53",
    activation_dropout=0.055,
    attention_dropout=0.094,
    hidden_dropout=0.047,
    feat_proj_dropout=0.04,
    mask_time_prob=0.082,
    layerdrop=0.041,
    gradient_checkpointing=True,
    ctc_loss_reduction="mean",
    pad_token_id=processor.tokenizer.pad_token_id,
    vocab_size=len(processor.tokenizer),
)

model.freeze_feature_extractor()

training_args = TrainingArguments(
    output_dir="/workspace/models/wav2vec-lt",
    group_by_length=True,
    per_device_train_batch_size=24,
    gradient_accumulation_steps=2,
    evaluation_strategy="steps",
    num_train_epochs=30,
    fp16=True,
    save_steps=1000,
    eval_steps=1000,
    logging_steps=1000,
    learning_rate=2.34e-4,
    warmup_steps=500,
    save_total_limit=20,
    load_best_model_at_end=True,
    greater_is_better=False,
    log_level='debug',
    dataloader_num_workers=6,
    metric_for_best_model="wer",
)

trainer = Trainer(
    model=model,
    data_collator=data_collator,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=dataset_prepared['train'],
    eval_dataset=dataset_prepared['valid'],
    tokenizer=processor.feature_extractor,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=5, early_stopping_threshold=0.0001)],
)

Model params were taken from another wav2vec finetuning example.
At first, I had a higher learning rate, later reduced to the current one.
I was thinking of using HPT to get the params to get the best results, but I’d like to resolve this problem first.

Any advice?

Have you been able to resolve the issue? Facing the same problem

Yes, I managed to solve it eventually. I solved it by playing around with various parameters when loading the pretrained model, batch sizes, and so on.

Can you share the final parameters you used?

Could you share your configurations? I have the same issue. Thank you

It was a long time ago and I’ve suspended the project for now.
Below are the settings for my last experiment.

You could also look into GitHub - Edresson/Wav2Vec-Wrapper: An easy way to fine-tune Wav2Vec 2.0 for low-resource languages..
It was very helpful for me.

model = Wav2Vec2ForCTC.from_pretrained(
    "facebook/wav2vec2-large-xlsr-53",
    gradient_checkpointing=True, 
    ctc_loss_reduction="mean", 
    pad_token_id=processor.tokenizer.pad_token_id,
    vocab_size=len(processor.tokenizer),
)

model.freeze_feature_extractor()

training_args = TrainingArguments(
    output_dir="/workspace/notebooks/models/wav2vec-lt",
    group_by_length=False,
    per_device_train_batch_size=24,
    gradient_accumulation_steps=2,
    evaluation_strategy="steps",
    num_train_epochs=30,
    fp16=True,
    save_steps=1000,
    eval_steps=1000,
    logging_steps=1000,
    learning_rate=4.42184e-05,
    warmup_steps=500,
    save_total_limit=10,
    load_best_model_at_end=True,
    greater_is_better=False,
    metric_for_best_model="wer",
    weight_decay=0.0354792,
)

trainer = Trainer(
    model=model,
    data_collator=data_collator,
    args=training_args,
    compute_metrics=compute_metrics,
    train_dataset=dataset_prepared['train'],
    eval_dataset=dataset_prepared['valid'],
    tokenizer=processor.feature_extractor,
    callbacks=[EarlyStoppingCallback(early_stopping_patience=3)],
)
2 Likes