Which Transformers model is suitable for Video to text and summarize the text?

How can we develop an AI-driven service to summarize content across platforms like newsletters and social media accounts, ensuring it’s tailored to users’ preferences and time constraints, and what challenges might arise in accurately capturing relevant information, as well as ways to measure its effectiveness? More importantly, which AI model or technology is most suitable for tackling this problem, considering the need for nuanced understanding and personalization of content?