Check out this streamlit app I call speech-to-chat where given an audio file or a youTube link it will
-
- Diarize speech
-
- Transcribe each speaker’s audio track
-
- Set up an LLM chat with the transcript loaded into its knowledge database
All that using OpenAI Whisper and gpt , and Hugging Face pyannote api developed by the amazing Hervé Bredin.
I regularly watch long form educational videos and this tool enables me to
- Quickly summarize / analyze the content
- Identify main bullet points and find the corresponding timestamps
- Generate SEO keywords and more…
I learned a lot while building this app and it is mind-boggling the capabilities that are now available for content processing vs 5 years ago. I have not found an existing service/vendor that does the above, please share if you have seen similar tools.
The free version of the app is deployed on
- HuggingFace Spaces : Speech To Chat - a Hugging Face Space by kobakhit
Below resources were invaluable:
- phData How to Guides
- pyannote
- Streamlit Forum
Please reach out if you have any questions or feedback!