I am doing a project with the goal to create a model which can translate Speech to Speech in real time EN-DE and DE-EN.
I have found the facebook one way translation facebook/textless_sm_cs_en · Hugging Face
The problem is that i tried the other way with facebook/s2t-wav2vec2-large-en-de · Hugging Face but it seems to be crashing.
I was thinking of using a model to convert EN Speech to Text then translate the EN Text to DE Text and then a Text to Speech to output the DE Speech.
I am not sure how to continue from here.
Can you give me some tips?
Thank you and regards