Speech recognition processing max_length

goat123 · October 19, 2024, 11:31am

Hello. I am working on an automatic speech recognition project. But most of the models say that the maximum processing time is 30 seconds. If the original data is longer than 30 seconds, I would like to know how to handle it. For example, is there a way to handle it in code other than manually dividing the original data?

John6666 · October 19, 2024, 12:18pm

This might have something to do with it.

goat123 · October 19, 2024, 12:38pm

Thanks for pointing out a good way. But I was referring to the preprocessing of the dataset before fine-tuning. So I was wondering how to divide the data with text and audio into answer chunks, if possible.

John6666 · October 19, 2024, 12:42pm

Would it be a pre-processing thing?

goat123 · October 19, 2024, 1:13pm

Thanks for the great resources. These are correct, but they are things I’ve referenced in the past. According to the tutorial, 30 seconds is the maximum. I can’t use the data after 30 seconds because it returns false. It says to discard anything over 30 seconds.

John6666 · October 19, 2024, 2:22pm

Would you simply split the file?

goat123 · October 19, 2024, 2:39pm

That’s a nice package. So I guess the only way to divide text into chunks is to do it myself?

John6666 · October 19, 2024, 2:44pm

I think it would be best if there was an integrated dataset processing library, but if the current one doesn’t support it…
Even though sound and text are bundled together, they are completely different information from the program’s point of view, so we need a program that ties the two together and finds the split point.
We should have to do SR at the stage of creating the data set. It’s hard to do this manually.
I wonder if anyone has made one…

Edit:
It’s close, but there’s no function for text splitting.

Other famous libraries when doing it manually

John6666 · October 19, 2024, 3:22pm

Isn’t this one close?

system · October 20, 2024, 3:23am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Whisper fine tuning on custom audio data Beginners	4	2751	February 15, 2025
Fine tuning whisper on custom dataset Beginners	3	935	January 11, 2024
Whisper on long audio files -- support for chunking? 🤗Transformers	3	5754	April 21, 2023
Speech recognition max length Beginners	2	115	October 29, 2024
Help about Whisper chunk_length Beginners	1	199	February 15, 2025

Related topics