Inconsistent evaluation result (WER) when finetuning wav2vev2 pretrained model

Kuray107 · February 21, 2022, 5:26pm

Hi guys. I am trying to finetune the pre-trained wav2vev2 model (facebook/wav2vec2-large-lv60) for the ASR task. I followed this article (wrote by @patrickvonplaten) to train and evaluate the performance. When running trainer.train(), the reported evaluation result is about 18% (you can refer to here). However, when I finished training and re-evaluate the same evaluation set (same procedure as the guiding article), the WER became 40%. Therefore, I am a little bit confused by the inconsistent WER result and would like to know if anyone has similar experience. Any help will be appreciated!

Kuray107 · February 21, 2022, 5:31pm

By the way, in my training process I only used limited training data (the first 300 samples of timit’s original training set) and a different evaluation set (the 301st ~ 350th samples of timit’s original training data). The compute_metrics passed to the trainer and the final evaluation process are the same as the guiding article, but let me re-post it here for your convenience:

    def compute_metrics(pred):
        pred_logits = pred.predictions
        pred_ids = np.argmax(pred_logits, axis=-1)

        pred.label_ids[pred.label_ids == -100] = processor.tokenizer.pad_token_id
        pred_str = processor.batch_decode(pred_ids)
        # we do not want to group tokens when computing the metrics
        label_str = processor.batch_decode(pred.label_ids, group_tokens=False)

        wer = wer_metric.compute(predictions=pred_str, references=label_str)

        return {"wer": wer}


    def map_to_result(batch):
        with torch.no_grad():
            input_values = torch.tensor(batch["input_values"], device="cuda").unsqueeze(0)
            logits = model(input_values).logits

        pred_ids = torch.argmax(logits, dim=-1)
        batch["pred_str"] = processor.batch_decode(pred_ids)
        batch["text"] = processor.batch_decode(batch["labels"], group_tokens=False)
        return batch

HanyXU · November 7, 2023, 10:00am

Have you solved the problem? Same problem as you.

Topic		Replies	Views
Word Error Rate in Wav2vec2 Fine Tuning Beginners	0	255	November 18, 2022
Different evaluation results during and after training: Wav2Vec2 finetuning 🤗Transformers	0	343	January 25, 2023
Got KeyError "eval_wer" for fine-tuning and evaluating wav2vec2 Beginners	3	1188	May 15, 2023
Training and evaluation loss goes down however, WER score stays the same 🤗Transformers	0	376	May 23, 2022
Swedish ASR: Fine Tuning Wav2Vec2 Models	4	869	March 23, 2021

Inconsistent evaluation result (WER) when finetuning wav2vev2 pretrained model

Related topics