Creating Object Detection Datasets
Converting an object detection dataset to hub format is a great way to get started with datasets of increasing complexity.

How to convert a YOLO object detection dataset to Hub format

This tutorial is also available as a Colab Notebook

Object detection using bounding boxes is one of the most common annotation types for Computer Vision datasets. This tutorial demonstrates how to convert an object detection dataset from YOLO format to Hub, and a similar method can be used to convert object detection datasets from other formats such as COCO and PASCAL VOC.

Create the Hub Dataset

The first step is to download the small dataset below called animals object detection.
animals_od.zip
278KB
Binary
animals object detection dataset
The dataset has the following folder structure:
1
data_dir
2
|_images
3
|_image_1.jpg
4
|_image_2.jpg
5
|_image_3.jpg
6
|_image_4.jpg
7
|_boxes
8
|_image_1.txt
9
|_image_2.txt
10
|_image_3.txt
11
|_image_4.txt
12
|_classes.txt
Copied!
Now that you have the data, let's create a Hub Dataset in the ./animals_od_hubfolder by running:
1
import hub
2
from PIL import Image, ImageDraw
3
import numpy as np
4
import os
5
6
ds = hub.empty('./animals_od_hub') # Create the dataset locally
Copied!
Next, let's specify the folder paths containing the images and annotations in the dataset. In YOLO format, images and annotations are typically matched using a common filename such as image -> filename.jpeg and annotation -> filename.txt . It's also helpful to create a list of all of the image files and the class names contained in the dataset.
1
img_folder = './animals_od/images'
2
lbl_folder = './animals_od/boxes'
3
4
# List of all images
5
fn_imgs = os.listdir(img_folder)
6
7
# List of all class names
8
with open(os.path.join(lbl_folder, 'classes.txt'), 'r') as f:
9
class_names = f.read().splitlines()
Copied!
Since annotations in YOLO are typically stored in text files, it's useful to write a helper function that parses the annotation file and returns numpy arrays with the bounding box coordinates and bounding box classes.
1
def read_yolo_boxes(fn:str):
2
"""
3
Function reads a label.txt YOLO file and returns a numpy array of yolo_boxes
4
for the box geometry and yolo_labels for the corresponding box labels.
5
"""
6
7
box_f = open(fn)
8
lines = box_f.read()
9
box_f.close()
10
11
# Split each box into a separate lines
12
lines_split = lines.splitlines()
13
14
yolo_boxes = np.zeros((len(lines_split),4))
15
yolo_labels = np.zeros(len(lines_split))
16
17
# Go through each line and parse data
18
for l, line in enumerate(lines_split):
19
line_split = line.split()
20
yolo_boxes[l,:]=np.array((float(line_split[1]), float(line_split[2]), float(line_split[3]), float(line_split[4])))
21
yolo_labels[l]=int(line_split[0])
22
23
return yolo_boxes, yolo_labels
Copied!
Finally, let's create the tensors and iterate through all the images in the dataset in order to populate the data in Hub. Boxes and their labels will be stored in separate tensors, and for a given sample, the first axis of the boxes array corresponds to the first-and-only axis of the labels array (i.e. if there are 3 boxes in an image, the labels array is 3x1 and the boxes array is 3x4).
1
with ds:
2
ds.create_tensor('images', htype='image', sample_compression = 'jpeg')
3
ds.create_tensor('labels', htype='class_label', class_names = class_names)
4
ds.create_tensor('boxes', htype='bbox')
5
6
# Define the format of the bounding boxes
7
ds.boxes.info.update(coords = {'type': 'fractional', 'mode': 'LTWH'})
8
9
for fn_img in fn_imgs:
10
11
img_name = os.path.splitext(fn_img)[0]
12
fn_box = img_name+'.txt'
13
14
# Get the arrays for the bounding boxes and their classes
15
yolo_boxes, yolo_labels = read_yolo_boxes(os.path.join(lbl_folder,fn_box))
16
17
# Append data to tensors
18
ds.append({'images': hub.read(os.path.join(img_folder, fn_img)),
19
'labels': yolo_labels.astype(np.uint32),
20
'boxes': yolo_boxes.astype(np.float32)
21
})
Copied!
In order for Activeloop Platform to correctly visualize the labels, class_names must be a list of strings, where the numerical labels correspond to the index of the label in the list.

Inspect the Hub Dataset

Let's check out the third sample from this dataset, which contains two bounding boxes.
1
# Draw bounding boxes for the fourth image
2
3
ind = 3
4
img = Image.fromarray(ds.images[ind ].numpy())
5
draw = ImageDraw.Draw(img)
6
(w,h) = img.size
7
boxes = ds.boxes[ind].numpy()
8
9
for b in range(boxes.shape[0]):
10
(xc,yc) = (int(boxes[b][0]*w), int(boxes[b][1]*h))
11
(x1,y1) = (int(xc-boxes[b][2]*w/2), int(yc-boxes[b][3]*h/2))
12
(x2,y2) = (int(xc+boxes[b][2]*w/2), int(yc+boxes[b][3]*h/2))
13
draw.rectangle([x1,y1,x2,y2], width=2)
14
draw.text((x1,y1), ds.labels.info.class_names[ds.labels[ind].numpy()[b]])
Copied!
1
# Display the image and its bounding boxes
2
img
Copied!
Congrats! You just created a beautiful object detection dataset! 🎉
Note: For optimal object detection model performance, it is often important for datasets to contain images with no annotations (See the 4th sample in the dataset above). Empty samples can be appended using:
ds.boxes.append(None)
or by specifying an empty array whose len(shape) is equal to that of the other samples in the tensor:
ds.boxes.append(np.zeros(0,4)) #len(sample.shape) == 2