Make 5 minute video and speech from text story

Is there a model on Hugging Face that can generate 5-minute videos with speech for children based on a textual story?
i can host it on my pc and use it