Different evaluation results during and after training: Wav2Vec2 finetuning

Zhaolin · January 25, 2023, 1:16pm

Hello,

With the notebook Fine-Tune Wav2Vec2 for English ASR with Transformers, I notice different results for evaluation during training with compute_metrics and after training on the same test dataset. Please find the details below.

To save time, I select part of the dataset with

timit[‘train’] = timit[‘train’].select(range(1000))
timit[‘test’] = timit[‘test’].select(range(500))

Without changing, I get the following training results.

Screenshot 2023-01-25 at 13.31.48718×362 16.8 KB
With load_best_model_at_end=True in TrainingArguments, the best model is loaded as:

Loading best model from ./checkpoint-2500 (score: 0.605413556098938).
This score is valid_loss, and the corresponding WER is 0.439.

With the loaded model, I run the map() function to evaluate. Then I get the results

Test WER: 0.386

I expect to get the almost same results. If you have any ideas, please leave a comment.

Topic		Replies	Views
Inconsistent evaluation result (WER) when finetuning wav2vev2 pretrained model Beginners	2	453	November 7, 2023
Wav2Vec2: loss growing in training and validation after few epochs Models	6	2046	September 25, 2024
Wav2Vec2: fix growing training and validation loss after few epochs Models	5	2241	January 27, 2022
Fine-tuning Wav2Vec2 for English ASR with 🤗 on local machine Transformers 🤗Transformers	1	421	August 10, 2021
Why is Wav2Vec pretraining loss not decreasing? Models	10	2645	April 29, 2022

Different evaluation results during and after training: Wav2Vec2 finetuning

Related topics