Compute_metrics do not find tokenizer (whisper finetuning)

remcbdx · March 6, 2024, 1:55pm

Hello eveyrone, I am trying to finetune whisper-large-v3 using my own dataset.
I used Bofeng Huang tutorial with some tweaks to manage that. (on medium here)

Problem is at validation, my compute metric function to not find the tokenizer.
When calling the compute_metrics, the function to not find the tokenizer.

Is it normal? Did the Seq2SeqTrainer trainer changed since this tutorial?

def compute_metrics(pred, do_normalize_eval=False):
    pred_ids = pred.predictions
    label_ids = pred.label_ids

    # replace -100 with the pad_token_id
    #label_ids[label_ids == -100] = tokenizer.pad_token_id

    # we do not want to group tokens when computing the metrics
    pred_str = tokenizer.batch_decode(pred_ids, skip_special_tokens=True)
    label_str = tokenizer.batch_decode(label_ids, skip_special_tokens=True)

    if do_normalize_eval:
        pred_str = [normalizer(pred) for pred in pred_str]
        # perhaps already normalised
        label_str = [normalizer(label) for label in label_str]
        # filtering step to only evaluate the samples that correspond to non-zero references
        pred_str = [pred_str[i] for i in range(len(pred_str)) if len(label_str[i]) > 0]
        label_str = [label_str[i] for i in range(len(label_str)) if len(label_str[i]) > 0]

    wer = metric.compute(predictions=pred_str, references=label_str)

trainer = Seq2SeqTrainer(
    model=model,
    args=training_args,
    train_dataset=vectorized_datasets["train"],
    eval_dataset=vectorized_datasets["test"],
    tokenizer=processor,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

MilutinTokic · March 6, 2024, 2:26pm

Hey Hugging Face bastardo I just wanted to say you that you are a bulshit morherfuckers for asking me to pay for your bulshit Hugging Face. I will get into your shitty platform and steal all of your code there and let it be public, same goes to open ai , microsoft, amazon, … FUCK YOU ALL!!! and after that I will come there and fuck you all in your asses and after that ai will go t your homes and impregnate all your mothers , you bastards …

Topic		Replies	Views
Trainer class and compute_metrics 🤗Transformers	0	334	May 6, 2021
Whisper is not learning a new tokenizer, even when i make test and train dataset the same Beginners	0	372	November 20, 2023
Couple of questions about Trainer Beginners	0	329	June 13, 2023
Evaluate Whisper on two different datasets 🤗Transformers	1	580	April 5, 2023
How is compute_metrics working internally? Beginners	2	816	August 30, 2021

Compute_metrics do not find tokenizer (whisper finetuning)

Related topics