Two way translation Speech to Speech model EN-DE

frogcho123 · December 5, 2022, 9:09pm

Hi

I am doing a project with the goal to create a model which can translate Speech to Speech in real time EN-DE and DE-EN.

I have found the facebook one way translation facebook/textless_sm_cs_en · Hugging Face

The problem is that i tried the other way with facebook/s2t-wav2vec2-large-en-de · Hugging Face but it seems to be crashing.

I was thinking of using a model to convert EN Speech to Text then translate the EN Text to DE Text and then a Text to Speech to output the DE Speech.

I am not sure how to continue from here.
Can you give me some tips?

Thank you and regards

seba3y · September 26, 2023, 11:35am

I’m working at this task before but translate from English to Arabic, Build cascaded pipeline consists of

Speech recognition for En use Wav2Vec model.
Punctuations restoration because Wav2Vec dismiss punctuations, we use deepmultilingualpunctuation model.
Machine Translation from En to Ar use mbart model.
Tashkeel restoration: you can learn more here
Text2Speech using fastSpeech2 model.

This pipeline very slow, computationally expensive, and the result not good at all. soon i will publish this work on GitHub for further discussion. At this time we try to simplify this pipeline using SpeechT5 for End2End Speech Translation or replace the first 3 models with one model can translate En Audio to Ar text and also work on first open source Automatic video dubbing from En to Ar and vis versa at first then add support for other language.

Topic	Replies	Views
Question Project STT - TTS - Sub translated Community Calls	491	September 3, 2023
SpeechBrain EncoderDecoderASR transcribe_file() Runs out of Memory Models	495	April 17, 2022
Chinese text to speech Models	507	April 18, 2024
Using inference api on espnet/kan-bayashi_ljspeech_vits model Beginners	379	November 27, 2021
Speech to Text concern 🤗Transformers	385	August 27, 2023

Two way translation Speech to Speech model EN-DE

Related topics