Prepare dataset from YOLO format to COCO for DETR

Alberto1404 · March 28, 2023, 10:19am

Hi. I would like to compare two nets using the same dataset, regardless being Transformer-based (DETR) vs Non-Transformer based (YOLOv5).
I have already trained a model using Yolov5, such that my dataset is already split into train-val-test, in YOLO format. See Formatting table to visualize an example. My dataset folder looks like this:

.
├── train
    └── images
    │   ├── ima1.png
    │   ├── ima2.png
    │   ├── ...
    └── labels
    │   ├── ima1.txt
    │   ├── ima2.txt
    │   ├── ...
├── val
    └── images
    │   ├── ima3.png
    │   ├── ima4.png
    │   ├── ...
    └── labels
    │   ├── ima3.txt
    │   ├── ima4.txt
    │   ├── ...
├── test
    └── images
    │   ├── ima5.png
    │   ├── ima6.png
    │   ├── ...
    └── labels
    │   ├── ima5.txt
    │   ├── ima6.txt
    │   ├── ...

Now I want to convert it to COCO format. From Hugging Face documentation, DETR demands COCO format in labels, using JSON files. However, you are using a dataset loaded from Hugging Face datasets library. Moreover, I would like to know if I should create 3 JSON files, for each split, or 1 JSON file containing all. In the latter case, could you provide some documentation on how should the JSON file be defined?
If there is any tutorial on how to prepare the data to feed DETR, based on my specs, it would be nice to post it here.
Thank you for all!

Alberto1404 · March 30, 2023, 4:59pm

Update

I did the following parser to convert it.

import os
import json
from PIL import Image
from tqdm import tqdm


def yolo_to_coco(image_dir, label_dir, output_dir):
	# Define categories
	categories = [{'id': 0, 'name': 'person'}]

	# Initialize data dict
	data = {'train': [], 'validation': [], 'test': []}

	# Loop over splits
	for split in ['train', 'validation', 'test']:
		split_data = {'info': {}, 'licenses': [], 'images': [], 'annotations': [], 'categories': categories}

		# Get image and label files for current split
		image_files = sorted(os.listdir(image_dir))
		label_files = sorted(os.listdir(label_dir))

		# Loop over images in current split
		cumulative_id = 0
		with tqdm(total=len(image_files), desc=f'Processing {split} images') as pbar:
			for i, filename in enumerate(image_files):
				image_path = os.path.join(image_dir, filename)
				im = Image.open(image_path)
				im_id = i + 1

				split_data['images'].append({
					'id': im_id,
					'file_name': filename,
					'width': im.size[0],
					'height': im.size[1]
				})

				# Get labels for current image
				label_path = os.path.join(label_dir, os.path.splitext(filename)[0] + '.txt')
				with open(label_path, 'r') as f:
					yolo_data = f.readlines()

				for line in yolo_data:
					class_id, x_center, y_center, width, height = line.split()
					class_id = int(class_id)
					bbox_x = (float(x_center) - float(width) / 2) * im.size[0]
					bbox_y = (float(y_center) - float(height) / 2) * im.size[1]
					bbox_width = float(width) * im.size[0]
					bbox_height = float(height) * im.size[1]

					split_data['annotations'].append({
						'id': cumulative_id,
						'image_id': im_id,
						'category_id': class_id,
						'bbox': [bbox_x, bbox_y, bbox_width, bbox_height],
						'area': bbox_width * bbox_height,
						'iscrowd': 0
					})

					cumulative_id += 1

				pbar.update(1)

		data[split] = split_data

	# Save data to JSON files
	for split in ['train', 'validation', 'test']:
		filename = os.path.join(output_dir, f'{split}.json')
		with open(filename, 'w') as f:
			json.dump({'data': data[split]}, f)

	return data

image_dir = '/home/alberto/Dataset/train/images'
label_dir = '/home/alberto/Dataset/train/labels'
output_dir = './'
coco_data = yolo_to_coco(image_dir, label_dir, output_dir)

However, when I want to load my dataset using:

from datasets import load_dataset
data_files = {
	"train": '/home/alberto/Dataset/train/images/train_labels.json',
	"validation": '/home/alberto/Dataset/val/images/val_labels.json',
	"test": '/home/alberto/Dataset/val/images/test_labels.json'
}
dataset = load_dataset("json", data_files=data_files)

Typing dataset['train'] outputs that number of rows is 1, which is not correct. It should be 7000, the number of images in the train set. Does anybody know where the error is commited?
Example with subset of train set:

Alberto1404 · April 4, 2023, 12:20pm

In order to read it using load_dataset, it is a must to follow the same structure as defined
here

Daniyalkhan26 · July 23, 2024, 10:01am

@Alberto1404 Have you find out the final script to convert from yolo format to coco for DETR? Have you resolved this issue" typing dataset['train'] outputs that number of rows is 1, which is not correct. It should be 7000, the number of images in the train set. Does anybody know where the error is commited?"

Godouche · May 6, 2025, 12:03pm

could you please provide the solution to transform YOLO to COCO for DETR?

Topic		Replies	Views
Unable to finetune DETR 🤗Transformers	0	469	April 4, 2023
Load a COCO format database from disk for DETR 🤗Datasets	4	99	May 14, 2025
Help making object detection dataset Beginners	4	51	April 26, 2025
HF Dataset to COCO format dataset 🤗Datasets	5	1115	December 31, 2023
Creating a object detection data set from one folder of several video frames Beginners	1	962	August 2, 2023

Prepare dataset from YOLO format to COCO for DETR

Update

Related topics