Fine-Tune Whisper Tensor size mismatch

RetaSy · December 11, 2022, 11:23am

Hello! I’m trying to following this blog in order to fine-tune Whisper on my data set. While training, I’m getting this error

Although during preparing my data I filtered the labels length, as @sanchit-gandhi suggested, to be less than the max_lenght of the model (448) but still getting the same error

Here are the link for my colab notebook
what can I do ?

sanchit-gandhi · December 21, 2022, 4:30pm

Hey @RetaSy! Sorry for the delay in getting back to you! Unfortunately I can’t access your notebook (need permissions!). Feel free to update them and ping me here, I can then take a more detailed look!

In the mean time, could you double check that the extra filter step is implemented before you instantiate the Trainer:

max_label_length = model.config.max_length

def filter_labels(labels):
    """Filter label sequences longer than max length"""
    return len(labels) < max_label_length

vectorized_datasets = vectorized_datasets.filter(filter_labels, input_columns=["labels"])

trainer = (train_dataset= vectorized_datasets["train"], ...)

Thanks!

maitrang · January 12, 2023, 9:56am

Hi @sanchit-gandhi, I got the same problem when trying to fine-tune camembert. I used the filter as you suggested and it works. However, the filter has downscaled too much of my dataset and then the model’s accuray is really bad. Do we have another way to deal with it? (As I see, my error come from: /transformers/models/camembert/modeling_camembert.py, line 871, in forward). Thanks in advance for your help.

sanchit-gandhi · January 16, 2023, 3:02pm

Hey @maitrang!

Welcome to the forum and thanks for opening up your first question post Awesome to have you here!

What you can do is first increase the value of the generation max length to some arbitrarily large value (e.g. 1024):

model.config.max_length = 1024

And then perform the filtering stage. By increasing the max length, we’ll raise the filter threshold for our dataset and thus filter less of it. This will give us more data to train on. However, it will also increase the memory requirement for training as we have potentially longer sequences in our training data.

Hope that answers your question!

SabaKhupenia · November 6, 2024, 12:04pm

Hello! I’m trying to fine-tune an already fine-tuned Whisper model on my dataset. Previously, I fine-tuned it on the Common Voice dataset, and as you suggested in a previous post, I filtered it to only include data with up to 448 tokens. This gave me decent results with a Word Error Rate (WER) of 19%.

Now, I’m trying to fine-tune this model on another dataset that I created. This dataset is just a JSON file containing audio file paths and corresponding text. However, the WER got worse, reaching 60%. Can you explain why this might have happened?

Additionally, I noticed that after filtering to 448 tokens, I lost a significant amount of data in my dataset. Is there a way to increase this limit so that I don’t have to cut off most of the data? What should I do now—should I start fine-tuning Whisper from scratch by combining both datasets, or did I make a mistake during the second round of fine-tuning?

Topic		Replies	Views
RuntimeError: The size of tensor a (553) must match the size of tensor b (448) at non-singleton dimension 1 Beginners	3	1085	July 17, 2024
The size of tensor error while fine tuning whisper Beginners	1	559	February 13, 2024
ValueError: `mask_length` has to be smaller than `sequence_length`, but got `mask_length`: 10 and `sequence_length`: 4` when finetuning wav2vec2.0 🤗Transformers	1	464	March 14, 2023
Trainer RuntimeError: The size of tensor a (462) must match the size of tensor b (448) at non-singleton dimension 1 🤗Transformers	17	44680	May 23, 2024
I am following a hugging face guide for fine tuning whisper but I run into error when training 🤗Transformers	0	171	March 15, 2024

Fine-Tune Whisper Tensor size mismatch

Related topics