Metrics for temporal consistency

What are some good metrics for object masks in a video for a temporal consistency task?