Speaker Isolation Toolkit is now in RC1

ScottishHaze · April 6, 2025, 5:31pm

Hello my friends!

Over the past year, I’ve been developing a beginner-friendly toolkit for our audio engineers out there looking to create datasets of TV and Movie actors. My free toolkit is available on my GitHub: SID Toolkit

The toolkit will allow you to give it a massive folder of video files, or just one file if it’s a movie, and extract the ENGLISH audio to a mono WAV file. After you isolate vocals with UVR or similar, the script “Diarizes” the audio using your HF_Token, to output a JSON of data of speakers in that audio file.

Once you identify about 5-20 files with your targeted speaker (more files needed if there’s a lot of different speakers) manually, you can use the cross-reference script to isolate the same speaker from ALL files in your working directory.

Finally, the isolate script then cuts up the audio files, isolating only the speaker you ID’ed, and clipping out all silences and non-speaker data, so a dataset can be created.

In my own examples, I used the TV Series House. There’s a total of 187 episodes released for this show, and after I brought it down to a mono-audio WAV, I identified “House” 7 times. It created a dataset with 817 WAV files, all 1 to 5 seconds long, trimmed and isolated and truncated so only House is speaking.

Please post bug reports and what not on the GitHub so I can keep working on it

Topic		Replies	Views
Creating a new dataset Beginners	1	246	February 13, 2024
Don't know where to start. Please help manipulating transcribed audio Beginners	0	203	March 11, 2024
Fine tuning whisper on custom dataset Beginners	3	927	January 11, 2024
How to create a dataset for "audio-like" files for ASR Beginners	0	401	April 10, 2023
RVC Model sounds like it just got back from the dentist! lol 🤗Hub	0	1243	December 14, 2023

Speaker Isolation Toolkit is now in RC1

Related topics