I was looking at example datasets from existing datasets and tutorials, and most dataset use a dictionary of lists for their annotations, to examine in the object detection case, most datasets have a format for each image of:
* `image`: PIL.Image.Image object containing the image.
* `image_id`: The image ID.
* `height`: The image height.
* `width`: The image width.
* `objects`: A dictionary containing bounding box metadata for the objects in the image:
* `id`: The annotation id.
* `area`: The area of the bounding box.
* `bbox`: The object’s bounding box (in the [coco](https://albumentations.ai/docs/getting_started/bounding_boxes_augmentation/#coco) format).
* `category`: The object’s category, with possible values
{
'image': <PIL.Image.Image>,
'image_id': 1,
'height': 480,
'width': 640,
'objects': {
'id': [1, 2],
'area': [100, 200],
'bbox': [[10, 10, 50, 50], [60, 60, 80, 80]],
'category': [0, 1]
}
}
Where the members of “objects” are lists with each index corresponding to an annotation
But I have chosen to make my dataset format:
* `image`: PIL.Image.Image object containing the image.
* `image_id`: The image ID.
* `height`: The image height.
* `width`: The image width.
* `objects`: A **list of dictionararies** containing bounding box metadata for the objects in the image:
* `id`: The annotation id.
* `area`: The area of the bounding box.
* `bbox`: The object’s bounding box (in the [coco](https://albumentations.ai/docs/getting_started/bounding_boxes_augmentation/#coco) format).
* `category`: The object’s category, with possible values
{
'image': <PIL.Image.Image>,
'image_id': 1,
'height': 480,
'width': 640,
'objects': [
{'id': 1, 'area': 100, 'bbox': [10, 10, 50, 50], 'category': 0},
{'id': 2, 'area': 200, 'bbox': [60, 60, 80, 80], 'category': 1}
]
}
because it is more easily digestible by the training algorithms (no need to change the format) and it is more easily convertible to COCO JSON
Q1) Are there any potential downsides to using this “list of dictionaries” format that I might be missing?
Q2) Is there a recommended “best practice” for custom object detection datasets, or is it flexible as long as the data can be correctly interpreted?
Note: I have mostly looked at the Object Detection datasets, so my perspective is limited as to recommended
Note: I am using the terms “list” and “dictionary” as the Pythonic terms, the arrow
database that the datasets
library uses may use different terms