How to get a video embedding from a pretrained transformer?

Julascar · March 14, 2023, 2:22pm

Hello,
I would like to use a videotransformer (like video.MAE) to get an embedding of a video (equivalent to the CLS token). Using the demo from hugging face, i get :
outputs.last_hidden_state.shape = torch.Size([1, 1568, 768])
I though that the first one was the CLS token but 1567 is a prime number and therefore should not correspond to the patched embeddings of the video.
Can someone help me ?

Topic		Replies	Views
Identical CLS token embeddings for all different sentences? Beginners	1	451	April 17, 2023
Special tokens with inputs_embeds input Beginners	0	261	July 10, 2023
How to obtain [CLS] embeddings from fine-tuned BERT model (using Transformers Trainer) Beginners	1	2659	June 27, 2022
How to get embedding matrix of bert in hugging face Beginners	8	41085	October 31, 2024
Abnormal large value of MobileBert's <cls> embed 🤗Transformers	0	123	November 1, 2023

How to get a video embedding from a pretrained transformer?

Related topics