Prepare dataset from YOLO format to COCO for DETR

Hi. I would like to compare two nets using the same dataset, regardless being Transformer-based (DETR) vs Non-Transformer based (YOLOv5).
I have already trained a model using Yolov5, such that my dataset is already split into train-val-test, in YOLO format. See Formatting table to visualize an example. My dataset folder looks like this:

β”œβ”€β”€ train
    └── images
    β”‚   β”œβ”€β”€ ima1.png
    β”‚   β”œβ”€β”€ ima2.png
    β”‚   β”œβ”€β”€ ...
    └── labels
    β”‚   β”œβ”€β”€ ima1.txt
    β”‚   β”œβ”€β”€ ima2.txt
    β”‚   β”œβ”€β”€ ...
β”œβ”€β”€ val
    └── images
    β”‚   β”œβ”€β”€ ima3.png
    β”‚   β”œβ”€β”€ ima4.png
    β”‚   β”œβ”€β”€ ...
    └── labels
    β”‚   β”œβ”€β”€ ima3.txt
    β”‚   β”œβ”€β”€ ima4.txt
    β”‚   β”œβ”€β”€ ...
β”œβ”€β”€ test
    └── images
    β”‚   β”œβ”€β”€ ima5.png
    β”‚   β”œβ”€β”€ ima6.png
    β”‚   β”œβ”€β”€ ...
    └── labels
    β”‚   β”œβ”€β”€ ima5.txt
    β”‚   β”œβ”€β”€ ima6.txt
    β”‚   β”œβ”€β”€ ...

Now I want to convert it to COCO format. From Hugging Face documentation, DETR demands COCO format in labels, using JSON files. However, you are using a dataset loaded from Hugging Face datasets library. Moreover, I would like to know if I should create 3 JSON files, for each split, or 1 JSON file containing all. In the latter case, could you provide some documentation on how should the JSON file be defined?
If there is any tutorial on how to prepare the data to feed DETR, based on my specs, it would be nice to post it here.
Thank you for all!


I did the following parser to convert it.

import os
import json
from PIL import Image
from tqdm import tqdm

def yolo_to_coco(image_dir, label_dir, output_dir):
	# Define categories
	categories = [{'id': 0, 'name': 'person'}]

	# Initialize data dict
	data = {'train': [], 'validation': [], 'test': []}

	# Loop over splits
	for split in ['train', 'validation', 'test']:
		split_data = {'info': {}, 'licenses': [], 'images': [], 'annotations': [], 'categories': categories}

		# Get image and label files for current split
		image_files = sorted(os.listdir(image_dir))
		label_files = sorted(os.listdir(label_dir))

		# Loop over images in current split
		cumulative_id = 0
		with tqdm(total=len(image_files), desc=f'Processing {split} images') as pbar:
			for i, filename in enumerate(image_files):
				image_path = os.path.join(image_dir, filename)
				im =
				im_id = i + 1

					'id': im_id,
					'file_name': filename,
					'width': im.size[0],
					'height': im.size[1]

				# Get labels for current image
				label_path = os.path.join(label_dir, os.path.splitext(filename)[0] + '.txt')
				with open(label_path, 'r') as f:
					yolo_data = f.readlines()

				for line in yolo_data:
					class_id, x_center, y_center, width, height = line.split()
					class_id = int(class_id)
					bbox_x = (float(x_center) - float(width) / 2) * im.size[0]
					bbox_y = (float(y_center) - float(height) / 2) * im.size[1]
					bbox_width = float(width) * im.size[0]
					bbox_height = float(height) * im.size[1]

						'id': cumulative_id,
						'image_id': im_id,
						'category_id': class_id,
						'bbox': [bbox_x, bbox_y, bbox_width, bbox_height],
						'area': bbox_width * bbox_height,
						'iscrowd': 0

					cumulative_id += 1


		data[split] = split_data

	# Save data to JSON files
	for split in ['train', 'validation', 'test']:
		filename = os.path.join(output_dir, f'{split}.json')
		with open(filename, 'w') as f:
			json.dump({'data': data[split]}, f)

	return data

image_dir = '/home/alberto/Dataset/train/images'
label_dir = '/home/alberto/Dataset/train/labels'
output_dir = './'
coco_data = yolo_to_coco(image_dir, label_dir, output_dir)

However, when I want to load my dataset using:

from datasets import load_dataset
data_files = {
	"train": '/home/alberto/Dataset/train/images/train_labels.json',
	"validation": '/home/alberto/Dataset/val/images/val_labels.json',
	"test": '/home/alberto/Dataset/val/images/test_labels.json'
dataset = load_dataset("json", data_files=data_files)

Typing dataset['train'] outputs that number of rows is 1, which is not correct. It should be 7000, the number of images in the train set. Does anybody know where the error is commited?
Example with subset of train set:

In order to read it using load_dataset, it is a must to follow the same structure as defined