Video Classification

Isabella · May 16, 2022, 10:43am

Hi everyone,
I am starting to look into the task of classifying videos, trying to understand what approaches are currently available.

Naively speaking, I guess one could randomly (maybe better, uniformly) sample N frames from a video, perform classification on each of them, and then aggregate predictions (most frequent prediction, most confident prediction, etc.). This may be reasonable for simple classification tasks (e.g. is there a cat in this video? Is the video set indoors or outdoors?).

On the other hand, this approach would lose any temporal information conveyed by the frame sequence and the sound/speech information, for which a multi-modal model that can process sequences would be required.

So I was wondering if any of you can point out examples of models that have been proposed/used for video classification in any of these directions.
I tried browsing the HuggingFace directory but could not find a “video classification” task category, and I have the feeling (after some web searching) that this topic is generally less covered than image or text classification.

Any pointer/suggestion is very much appreciated

Topic		Replies	Views
Looking for Pre-trained Model for Image Categorization (Screenshots, Photos, Scans, etc.) Models	2	1261	April 3, 2024
Why ucf 101 dataset is not working for video classification 🤗Datasets	0	233	April 3, 2023
Bert for audio classification Research	0	1159	April 25, 2022
Videos for training data 🤗Datasets	1	382	May 6, 2024
Looking for open-source AI that automatically classifies and blurs images/videos based on gender Models	0	171	September 2, 2024

Video Classification

Related topics