I have a screen-recording video of someone performing a series of tasks in a back-office/ investment banking workflow. I want to create a structured document from this video that includes:
Only contextually relevant screenshots showing actual UI changes
Step-by-step descriptions of what is happening in the video
A final well-formatted document combining screenshots and descriptions
The goal is that a person shouldn’t need to watch the video—they should be able to understand the entire task and its step-by-step procedure just by going through the document.
What’s the best way or workflow to achieve this using AI?