[Open-to-the-community] Whisper fine-tuning event

Hey @steja! This is pretty unlucky :sweat_smile: It means that we have a sample with 504 tokens in our training set, but the model has a max length of 448. Could you add an extra filter step to your dataset before you instantiate the Trainer:

max_label_length = model.config.max_length

def filter_labels(labels):
    """Filter label sequences longer than max length"""
    return len(labels) < max_label_length

vectorized_datasets = vectorized_datasets.filter(filter_labels, input_columns=["labels"])

This should fix the issue!

1 Like