Creating a object detection data set from one folder of several video frames

SDK1986 · July 18, 2023, 8:42am

Hi all,

I have spent the last few weeks trying to get my head around hugging face and its possibilities. I have been unable to figure out how to create a data set from one folder with just under 18000 video frames from several videos. I exported the video frames using CVAT for videos.
I have one XML file with all the bounding box point data, and I am wanting to create a data set that can then be split to the train test and validate datasets.

Then hopefully figure out how to use the pipeline() function to fine-tune a model. I am stuck and need some guidance on what to do next.

Thank you.

SDK1986 · August 2, 2023, 2:47am

For others who get stuck, I have got to the stage now that I can begin fine-tuning a pre-trained model by following these steps.

finding the desired annotation format for the selected pre-trained model. In my case, it was DETR and one JSON file with all the data was required in a specific format that was found by looking at DETR (huggingface.co)
split the images into train, validate and test datasets. As the hugging face hub only allows 10,000 files per dataset.
for each dataset, create a JSON file in the correct format with the frames data. I wrote a bespoke script in python to do this.
write a script to fine-tune the selected pre-trained model.

Topic		Replies	Views
Prepare dataset from YOLO format to COCO for DETR 🤗Transformers	4	5230	May 6, 2025
Unable to finetune DETR 🤗Transformers	0	472	April 4, 2023
Help making object detection dataset Beginners	4	65	April 26, 2025
How do I create an Multi-label Image classification dataset? Beginners	0	136	July 15, 2024
GUI for visualizing and editing the annotations of an object detection dataset 🤗Datasets	1	17	July 1, 2025

Creating a object detection data set from one folder of several video frames

Related topics