ML for Audio Study Group - pyctcdecode (Jan 18)

osanseviero · January 10, 2022, 9:01pm

Welcome to the fourth session of ML for Audio Study Group!

We have a very special webinar organized for you! The Kensho team will join us to give a cool presentation of pyctcdecode.

Topic: pyctcdecode: A simple and fast speech-to-text prediction decoding algorithm

Speakers

Raymond Grossman (LinkedIn: www.linkedin.com/in/raymond-grossman-bb4664114)
Raymond works as a machine learning engineer at Kensho Technologies, specializing in speech and natural language domains. Prior to coming to Kensho, he studied mathematics at Princeton and was an avid Kaggler under the moniker @ToTrainThemIsMyCause.
Jeremy Lopez (LinkedIn: https://www.linkedin.com/in/jeremy-lopez-9107b613a)
Jeremy is a machine learning engineer at Kensho Technologies and has worked on a variety of different topics including search and speech recognition. Before working at Kensho, he earned a PhD in experimental particle physics at MIT and continued doing physics research as a postdoc at the University of Colorado Boulder.

Suggested Resources (If you want to jump ahead)

How to join

Watch the livestream in YouTube.
Join in our Discord (hf.co/join/discord at the channel #ml-4-audio-study-group).
Check out the GitHub repository of the project.

You can post all your questions in this topic! They will be answered during the session

Iskaj · January 17, 2022, 12:33pm

I was wondering how the hotword boosting is implemented. Is it simply changing the probabilities of the language model by a factor of x or is it something fancy?

reach-vb · January 18, 2022, 1:03pm

What is the future roadmap for pyctcdecode?

hoony · January 18, 2022, 1:12pm

how many hotwords can we add to the lm?
how long does the build take?
can pyctcdecode handle foreign languages such as korean?

mpierrau · January 18, 2022, 3:26pm

Hi! I am also interested in this topic - how is it implemented and what is the maths behind it? Please feel free to really dig deep into the details Unfortunately I won’t be able to join this afternoon, but will watch the stream afterwards!

mpierrau · January 18, 2022, 3:35pm

Furthermore I stumbled upon this discussion on your GitHub: Difficulty seeing meaningful changes with hotword boosting · Issue #18 · kensho-technologies/pyctcdecode · GitHub

A user is having issues where he’s not seeing meaningful differences when using hotwords, even if upweighting the words to a very large number (like 9999999.0). I tried this myself and had the same experience. Can you please elaborate on this issue and if you have made any attempts to make it easier for users to finetune their LM:s for this specific purpose?

Maryamas · January 18, 2022, 3:40pm

Hi. Thanks a lot for organizing this study group.
1- Please explain different approaches like Viterbi, WSFT, and beam search? What are the differences?
Please compare them in terms of accuracy and efficiency, too.

2- How to choose beam size for beam search? What is the best value or range for beam size especially if we want to compare different methods in reporting in a research paper?

3- Is beam size related to the acoustic model? Is it true that some models need a larger beam size for generating reasonable text sequences?

4- How to choose the number of subword pieces in BPE for decoding?
Thanks again.

Iskaj · January 18, 2022, 4:17pm

During the training of a STT system, let’s say Wav2Vec 2.0, do we include the language model (so we do CTC-decoding with an LM) during training?

Maryamas · January 18, 2022, 4:46pm

Can we use it for rnn-based and attention based models to generate text?

phantomcoder1996 · January 18, 2022, 4:48pm

Is it possible to use acoustic models with phoneme output with pyctc decode and add a lexicon?

Maryamas · January 18, 2022, 6:08pm

Hi. Thanks a lot for organizing this study group.
1- Please explain different approaches like Viterbi, WSFT, and beam search? What are the differences?
Please compare them in terms of accuracy and efficiency, too.

2- How to choose beam size for beam search? What is the best value or range for beam size especially if we want to compare different methods in reporting in a research paper?

3- Is beam size related to the acoustic model? Is it true that some models need a larger beam size for generating reasonable text sequences?

4- How to choose the number of subword pieces in BPE for decoding?
Thanks again.

Topic		Replies	Views
ML for Audio Study Group - Kick Off (Dec 14) Community Calls	13	2408	December 16, 2021
Community content of the week (01/20/2022) Community Calls	0	1847	January 20, 2022
ML for Audio Study Group - Intro to Audio and ASR (Dec 21) Community Calls	10	2422	December 22, 2021
Russian ASR: Fine-tuning Wav2Vec2 Languages at Hugging Face	20	2695	May 22, 2021
ML for Audio Study Group - Text to Speech Deep Dive (Jan 4) Community Calls	10	3245	January 10, 2022

ML for Audio Study Group - pyctcdecode (Jan 18)

Related topics