YOLOv7: a deep dive into the current state-of-the-art for object detection

December 16, 2022

YOLO (You Only Look Once) is a popular algorithm used for detecting objects in visual media. It was developed in 2016 by Joseph Redmon, a former PhD student at the University of Washington, along with his colleagues and has been widely used in the field of computer vision.

Released in July 2022, “YOLOv7” is a new version of the algorithm that improves the performance and accuracy of “YOLOv5”.

*YOLOv7 Detecting cars with bounding box predictions*

The YOLO Architecture

The YOLO architecture consists of several convolutional layers followed by fully connected layers. The input image is passed through the convolutional neural network (CNN) and the output is a set of bounding boxes and class probabilities for each object detected in the image. The bounding boxes are adjusted to eliminate the overlapping boxes and improve the accuracy of the detection.

What’s New in YOLOv7

YOLOv7 brings these main improvements:

The use of E-ELAN architecture (Extended Efficient Layer Aggregation Network) instead of ELAN (Efficient Layer Aggregation Network)
New model scaling techniques
Re-parameterization planning

YOLOv7 proposes a new architecture of real-time object detection with a new scaling algorithm using the trainable “bag-of-freebies” method. The algorithm currently surpasses any other real-time object detectors in terms of speed and accuracy, as shown in the following figure:

*YOLOv7: Comparison with other real-time object detectors*

As shown above, YOLOv7 is faster than other state of the art object detectors and 120% faster than YOLOv5.

Conclusion

The official YOLOv7 provides better speed and accuracy compared to its previous versions. It has an optimized architecture and proposes new strategies for practical training and inference. The models were trained to detect the generic 80 classes in the COCO dataset. However, if you want to use it for your own applications, you have to train YOLOv7 on your own custom datasets.