YOLO Annotation
Lately, I have been reading about YOLO and this note is about how the annotations are stored for training YOLO.
Each image in the dataset should have a corresponding txt
file. The bounding box in the image is represented by each line in the text file. The syntax of the line is as follows:
class_id x y w h
x
and y
are coordinates of the mid point of the bounding box. w
and h
are the width and height of the bounding box. The values for x
, y
, w
and h
are expressed relative to the image(ratio).
Let’s say, im_w
and im_h
are the width and height of an image, and (x_min, y_min)
and (x_max, y_max)
are two diagonally opposite coordinates of a bounding box. To convert them into YOLO metrics, we find the midpoint of the bounding box- ((x_min + x_max) / 2 , (y_min + y_max) / 2)
. The width and height of bounding box is given by x_max - x_min
and y_max - y_min
. Then to express the values relative to image, we divide these values by im_w
and im_h
.
x
= (x_min + x_max) / (2 * im_w)
y
= (y_min + y_max) / (2 * im_h)
w
= (x_max - x_min) / im_w
h
= (y_max - y_min) / im_h
Below is the Python code.
Manivannan has shared an annotation tool and wrote about how to use it. Please note that his tool is in Python 2 and with some edits will work on Python 3.