This pipeline very slow, computationally expensive, and the result not good at all. soon i will publish this work on GitHub for further discussion. At this time we try to simplify this pipeline using SpeechT5 for End2End Speech Translation or replace the first 3 models with one model can translate En Audio to Ar text and also work on first open source Automatic video dubbing from En to Ar and vis versa at first then add support for other language.