Creating Batch Sizes for Video Transcription Dataset

OtakuYA99 · April 5, 2022, 3:00pm

Hello Everyone.
I am currently working on a project that involves segmenting a transcript of a video into homogenous sections based on the topics present in the video. I am currently implementing a transformer model similar to the one mentioned in this research paper https://arxiv.org/pdf/2110.07160.pdf
My dataset consists of video transcript instances along with a label column having values 1 (denoting a change in topic) or 0 (denoting the same topic).
I have stored the entire dataset in a dataframe object (see the attached image). The dataset consists of 600+ videos and i have merged them into a single dataframe.
I am currently confused as to how I am supposed to create batches. I know that the input to the transformer model should be of the dimensions [batch size, seq len, features]. However, I am confused on how to create batch sizes since every video has a different no of sentences. So i cant have a batch size of eg 50 sentences since that could result in a batch including more than 1 video in it which shouldnt be the case ( i think ). Ideally, I would want batch size to represent the no of videos. So a batch size of 32 would mean 32 videos just as in a Computer Vision problem 32 represents 32 training images. Is that achievable in this case? If so, can someone guide me on how to achieve that. Your help would be greatly appreciated

Topic		Replies	Views
Data sampler based on number of tokens 🤗Transformers	0	730	February 4, 2022
How does the Transformer handle different batch sizes? 🤗Transformers	3	3541	January 24, 2024
Training with varying lengths of sequences Beginners	0	1611	May 31, 2023
Streaming datasets and batched mapping 🤗Datasets	5	2661	January 10, 2022
Expected input batch_size (2048) to match target batch_size (4) Beginners	3	1602	May 23, 2022

Creating Batch Sizes for Video Transcription Dataset

Related topics