Hi all,
I have spent the last few weeks trying to get my head around hugging face and its possibilities. I have been unable to figure out how to create a data set from one folder with just under 18000 video frames from several videos. I exported the video frames using CVAT for videos.
I have one XML file with all the bounding box point data, and I am wanting to create a data set that can then be split to the train test and validate datasets.
Then hopefully figure out how to use the pipeline() function to fine-tune a model. I am stuck and need some guidance on what to do next.
Thank you.
For others who get stuck, I have got to the stage now that I can begin fine-tuning a pre-trained model by following these steps.
-
finding the desired annotation format for the selected pre-trained model. In my case, it was DETR and one JSON file with all the data was required in a specific format that was found by looking at DETR (huggingface.co)
-
split the images into train, validate and test datasets. As the hugging face hub only allows 10,000 files per dataset.
-
for each dataset, create a JSON file in the correct format with the frames data. I wrote a bespoke script in python to do this.
-
write a script to fine-tune the selected pre-trained model.