Welcome to the fourth session of ML for Audio Study Group!
We have a very special webinar organized for you! The Kensho team will join us to give a cool presentation of pyctcdecode.
Topic: pyctcdecode: A simple and fast speech-to-text prediction decoding algorithm
Speakers
Raymond Grossman (LinkedIn: www.linkedin.com/in/raymond-grossman-bb4664114)
Raymond works as a machine learning engineer at Kensho Technologies, specializing in speech and natural language domains. Prior to coming to Kensho, he studied mathematics at Princeton and was an avid Kaggler under the moniker @ToTrainThemIsMyCause.
Jeremy Lopez (LinkedIn: https://www.linkedin.com/in/jeremy-lopez-9107b613a)
Jeremy is a machine learning engineer at Kensho Technologies and has worked on a variety of different topics including search and speech recognition. Before working at Kensho, he earned a PhD in experimental particle physics at MIT and continued doing physics research as a postdoc at the University of Colorado Boulder.
I was wondering how the hotword boosting is implemented. Is it simply changing the probabilities of the language model by a factor of x or is it something fancy?
Hi! I am also interested in this topic - how is it implemented and what is the maths behind it? Please feel free to really dig deep into the details Unfortunately I wonโt be able to join this afternoon, but will watch the stream afterwards!
A user is having issues where heโs not seeing meaningful differences when using hotwords, even if upweighting the words to a very large number (like 9999999.0). I tried this myself and had the same experience. Can you please elaborate on this issue and if you have made any attempts to make it easier for users to finetune their LM:s for this specific purpose?
Hi. Thanks a lot for organizing this study group.
1- Please explain different approaches like Viterbi, WSFT, and beam search? What are the differences?
Please compare them in terms of accuracy and efficiency, too.
2- How to choose beam size for beam search? What is the best value or range for beam size especially if we want to compare different methods in reporting in a research paper?
3- Is beam size related to the acoustic model? Is it true that some models need a larger beam size for generating reasonable text sequences?
4- How to choose the number of subword pieces in BPE for decoding?
Thanks again.
Hi. Thanks a lot for organizing this study group.
1- Please explain different approaches like Viterbi, WSFT, and beam search? What are the differences?
Please compare them in terms of accuracy and efficiency, too.
2- How to choose beam size for beam search? What is the best value or range for beam size especially if we want to compare different methods in reporting in a research paper?
3- Is beam size related to the acoustic model? Is it true that some models need a larger beam size for generating reasonable text sequences?
4- How to choose the number of subword pieces in BPE for decoding?
Thanks again.