There are lot of question and answer models available. But all are text based models.
I am looking for models which can use with voice. It is use for virtual assistance.
- Use for question and answering
- Remember name of person who interact
- speech to text in not work when use person names
- Chat model need to much data to train
Which are available models for voice to voice? What are best option for above reequipment?
I don’t think a single model would be enough. We have not reached that level of sophistication and not sure if that is even possible. You have multiple goals here and you should use specialised models for each. For example a separate model for speech to text, separate model for extracting name from the text etc.
Added advantage is that this would maintain modularity and can focus on improving the weakest link in the pipeline separately. Remember models suffer from COCA (Changing One thing Changes Everything) and hence it is better to avoid models that do too many things.