I have a database with definitions of words in Spanish, and I would like to extract a word from that list from an audio. The database has approximately 100k different words. Is there any model that can be used for that task?
I guess I could use a general speech recognition model, but is there any way that having the list of possible words makes it better? Even though the list of possible outcomes that I want is quite big, Spanish has the advantage that “it’s read as it’s spelled”.