Home >Technology peripherals >AI >Improved detection algorithm: for target detection in high-resolution optical remote sensing images
Currently, it is difficult to achieve an appropriate balance between detection efficiency and detection results. We have developed an enhanced YOLOv5 algorithm for target detection in high-resolution optical remote sensing images, using multi-layer feature pyramids, multi-detection head strategies and hybrid attention modules to improve the effect of the target detection network in optical remote sensing images. According to the SIMD data set, the mAP of the new algorithm is 2.2% better than YOLOv5 and 8.48% better than YOLOX, achieving a better balance between detection results and speed.
With the rapid development of remote sensing technology, high-resolution optical remote sensing images have been used to describe the earth Many objects on the surface, including airplanes, cars, buildings, etc. Object detection plays a vital role in the interpretation of remote sensing images and can be used for segmentation, description and target tracking of remote sensing images. However, due to their relatively large field of view and high altitude necessities, aerial optical remote sensing images exhibit diversity in scale, viewpoint specificity, random orientation, and high background complexity, whereas most traditional datasets contain terrestrial views . As a result, the techniques used to construct artificial feature detection have traditionally had a record of large differences in accuracy and speed. Due to the needs of society and the support of the development of deep learning, the use of neural networks for target detection in optical remote sensing images is necessary.
Currently, target detection algorithms that combine deep learning to analyze optical remote sensing photos can be divided into three types: supervised, unsupervised and weakly supervised. However, due to the complexity and uncertainty of unsupervised and weakly supervised algorithms, supervised algorithms are the most commonly used algorithms. Furthermore, supervised object detection algorithms can be divided into single-stage or two-stage. Based on the assumption that aircraft are usually located at airports and ships are usually located at ports and oceans, detecting airports and ports in downsampled star images, and then mapping the discovered objects back to the original ultra-high-resolution satellite images, can detect objects of different sizes simultaneously. Some researchers have proposed a rotating target detection method based on RCNN, which improves the accuracy of target detection in remote sensing images by solving the randomization problem of target directions.
Most of the current YOLO series detection heads are based on the output characteristics of FPN and PAFPN, among which the ones based on FPN Networks, such as YOLOv3, and its variants are shown in Figure a below. They directly utilize the one-way fusion feature for output. YOLOv4 and YOLOv5 based on the PAFPN algorithm add a low-level to high-level channel on this basis, which directly transmits low-level signals upward (b below).
As shown in the figure above, in some studies, a detection head was added to the specific detection task in the TPH-YOLOv5 model. In Figures b and c above, only the PAFPN function can be used for output, while the FPN function is not fully utilized. Therefore, YOLOv7 connects three auxiliary heads to the FPN output, as shown in Figure d above, although the auxiliary heads are only used for "coarse selection" and have a lower weight assessment. The SSD detection head is proposed to improve the YOLO network's too rough design of the anchor set, and proposes a dense anchor design based on multi-scale. As shown in Figure f, this strategy can simultaneously utilize the feature information of PANet and FPN. In addition, there is a 64x downsampling process that directly adds the output, which makes the network contain previous global information.
The multi-detection head method can effectively utilize the output features of the network. Improved YOLO is an object detection network for high-resolution remote sensing photos. As shown in the figure below:
The basic structure of the backbone network is a CSP dense network with C3 and convolution modules as the core. After data augmentation, images are fed into the network and after channel mixing by Conv module with kernel size 6, many convolutional modules perform feature retrieval. After a feature enhancement module called SPPF, they are connected to Neck’s PANet. In order to improve the detection ability of the network, two-way feature fusion is performed. Conv2d is used to independently expand the fused feature layers to generate multi-layer outputs. As shown in the figure below, the NMS algorithm combines the outputs of all single-layer detectors to generate the final detection frame.
Figure b below describes the structural composition of each module of the improved YOLO network.
Conv includes a 2D convolution layer, BN layer batch normalization and Silu activation function, C3 includes two 2D convolution layers and a bottleneck layer, and Upsample is an upsampling layer. The SPPF module is an accelerated version of the SPP module, the MAB module is as mentioned above, and the ECA is as shown in the lower left corner. After channel-level global average pooling without dimensionality reduction, fast 1D convolutions of size k are used to capture local cross-channel interaction information, taking into account the relationship of each channel with its k neighbors, thereby efficiently performing ECA. The above two transformations collect features along two spatial directions to produce a pair of direction-aware feature maps, which are then concatenated and modified using convolution and sigmoid functions to provide attention output.
The SIMD dataset is a multi-category, open source, high-resolution remote sensing object detection dataset, containing a total of 15 categories, as shown in Figure 4. In addition, the SIMD dataset is more distributed in small and medium-sized targets (w
The output of the SPPF module can be connected to the output header to identify large targets in the image. However, the output of the SPPF module has multiple connections and involves targets at multiple scales, so using it directly for the detection head to identify large objects will result in poor model representation, as shown in the figure above, showing before and after adding the MAB module Visual comparison of heatmaps of some detection results. After adding the MAB module, the detection head focuses on detecting large targets, and allocates the prediction of small targets to other prediction heads, which improves the expression effect of the model and is more in line with the requirements of dividing detection heads based on target size in the YOLO algorithm.
Some test results are shown in the picture above. Judging from each detection result, there is not much difference from other algorithms. However, compared with other algorithms, the algorithm we studied improves the detection effect of the model while ensuring that the time consumption does not increase significantly, and uses the attention mechanism to enhance The expression effect of the model.
The above is the detailed content of Improved detection algorithm: for target detection in high-resolution optical remote sensing images. For more information, please follow other related articles on the PHP Chinese website!