How Do I make a Dataset

I want to make an custom Image classification dataset, something like fuliucansheng/pascal_voc · Datasets at Hugging Face it must be a multilabel classification dataset. Then I would Like to use a vision transformer to finetune on it. Someone please help me out.
Multiple Object Detector PASCAL 2007 - a Hugging Face Space by archietram
Something like this.
Thank You