Home >Technology peripherals >AI >Deep learning image segmentation: an overview of network structure design
This article summarizes the innovations in network structure when using CNNs for image semantic segmentation. These innovations mainly include the design of new neural architectures (different depths, widths, connections and topologies) and the design of new components or layers. The former uses existing components to assemble complex large-scale networks, while the latter prefers to design underlying components. First, we introduce some classic semantic segmentation networks and their innovations, and then introduce some applications of network structure design in the field of medical image segmentation.
FCN overall architecture
Simplified diagramThe FCN network is listed separately because the FCN network is the first network to solve the problem of semantic segmentation from a new perspective. Previous image semantic segmentation networks based on neural networks used image blocks centered on the pixel to be classified to predict the label of the central pixel. The network was generally constructed using the CNN FC strategy. Obviously, this method cannot utilize the global context information of the image. Moreover, the pixel-by-pixel reasoning speed is very low; while the FCN network abandons the fully connected layer FC and uses convolutional layers to build the network. Through the strategy of transposed convolution and different layer feature fusion, the network output is directly the prediction mask of the input image, which is efficient. and accuracy are greatly improved.
Schematic diagram of feature fusion of different layers of FCN
Innovation point: Full volume Product network (excluding fc layer); transposed convolution deconv (deconvolution); different layer feature map skip connection (addition)
Innovation point: Encoder-Decoder structure; Pooling indices.
SegNet Network
## Comparison of the Upsample method between SegNet and FCN
Innovation points: U-shaped structure; short-circuit channel (skip-connection)
U-NetNetwork
V-Net Network
Innovation point: Quite The 3D version of the U-Net network
##FC-DenseNet (Hundred-Layer Tiramisu Network)
Innovation point: Integration of DenseNet and U-Net networks (from the perspective of information exchange Look, dense connections are indeed more powerful than residual structures)
Cascade Atrous Convolution
Parallel Atrous Convolution (ASPP)
4) DeepLabV3: Add the idea of encoding and decoding architecture, add a decoder module to extend DeepLabv3; apply depth separable convolution to ASPP and decoder module; improved Xception as Backbone.DeepLabV3
In general, the core contributions of DeepLab series: dilated convolution; ASPP; CNN CRF (V1 only Using CRF with V2, it should be that V3 and V3 solve the problem of blurred segmentation boundaries through deep networks, and the effect is better than adding CRF)
##PSPNet network
Innovation point: Multi-scale pooling ization to better leverage global image-level prior knowledge to understand complex scenes
##RefineNet Network
Innovation point: Refine module1.3 Reduce the computational complexity of the network structure
Lightweight network design is the consensus in the industry. For mobile deployment, it is impossible to equip each machine with a 2080ti. In addition, power consumption, storage and other issues will also limit the promotion and application of the model. However, if 5G becomes popular, all data can be processed in the cloud, which will be very interesting. Of course, in the short term (ten years), we don’t know whether full-scale deployment of 5G is feasible.
1.4 Network structure based on attention mechanism
1.5 Network structure based on adversarial learning
● G is a generative network, which receives a random noise z (random number), and generates an image through this noise
● D is a discriminative network, which determines whether an image is Not "real". Its input parameter is x (a picture), and the output D(x) represents the probability that x is a real picture. If it is 1, it means 100% is a real picture, and the output is 0, which means it cannot be real. picture.
G’s training procedure is to maximize the probability of D error. It can be proved that in the space of any functions G and D, there is a unique solution such that G reproduces the training data distribution, and D=0.5. During the training process, the goal of the generation network G is to try to generate real pictures to deceive the discriminant network D. The goal of D is to try to distinguish the fake images generated by G from the real images. In this way, G and D constitute a dynamic "game process", and the final equilibrium point is the Nash equilibrium point. In the case where G and D are defined by a neural network, the entire system can be trained with backpropagation.
GANs network structure diagramInspired by GANs, Luc et al. trained a semantic segmentation network (G) and a confrontation Network (D), the adversarial network distinguishes segmentation maps from ground truth or semantic segmentation networks (G). G and D continue to play games and learn, and their loss functions are defined as:
GANs loss function
Review the original GAN loss function: The loss function of GANs embodies the idea of a zero-sum game. The loss function of the original GANs is as follows:
The calculation position of the loss is at the output of D (discriminator), and the output of D is generally a fake/true judgment, so the overall situation can be considered to be a binary cross-entropy function. It can be seen from the form of the loss function of GANs that training is divided into two parts:
The first is the maxD part, because training generally first trains D while keeping G (generator) unchanged. The training goal of D is to correctly distinguish fake/true. If we use 1/0 to represent true/fake, then for the first item E, because the input is sampled from real data, we expect D(x) to approach 1, which is the first Items are larger. In the same way, the second item E input samples data generated from G, so we expect D(G(z)) to approach 0 better, which means that the second item is larger again. So this part is the expectation that training will make the whole bigger, which is the meaning of maxD. This part only updates the parameters of D.
The second part keeps D unchanged (no parameter update) and trains G. At this time, only the second item E is useful. The key is here, because we want to confuse D, so at this time the label is set to 1 (we know it is fake, so it is called confusion). We hope that the output of D(G(z)) is close to 1, that is, the smaller this term is, the better. This is minG. Of course, the discriminator is not so easy to fool, so at this time the discriminator will produce a relatively large error. The error will update G, and then G will become better. I didn’t fool you this time, so I can only work harder next time. (Quoted from https://www.cnblogs.com/walter-xh/p/10051634.html). At this time, only the parameters of G are updated.
Looking at GANs from another perspective, the discriminator (D) is equivalent to a special loss function (composed of a neural network, different from traditional L1, L2, cross-entropy and other loss functions).
In addition, GANs have a special training method and have problems such as gradient disappearance and mode collapse (there seems to be a way to solve it at present), but its design idea is indeed a great invention in the era of deep learning.
Most of the image semantic segmentation models based on deep learning follow the encoder-decoder architecture, such as U-Net. Research results in recent years have shown that dilated convolution and feature pyramid pooling can improve U-Net style network performance. In Section 2, we summarize how these methods and their variants can be applied to medical image segmentation.
This section introduces some research results on the application of network structure innovation in 2D/3D medical image segmentation.
In order to achieve real-time processing of high-resolution 2D/3D medical images (such as CT, MRI and histopathology images, etc.), researchers have proposed a variety of compression models Methods. Weng et al. used NAS technology to apply to the U-Net network and obtained a small network with better organ/tumor segmentation performance on CT, MRI and ultrasound images. Brugger redesigned the U-Net architecture by utilizing group normalization and Leaky-ReLU (leaky ReLU function) to make the network's storage efficiency for 3D medical image segmentation more efficient. Some people have also designed dilated convolution modules with fewer parameters. Some other model compression methods include weight quantization (sixteen-bit, eight-bit, binary quantization), distillation, pruning, etc.
Drozdal proposed a method that applies a simple CNN to normalize the original input image before feeding the image into the segmentation network, improving Improved the segmentation accuracy of singleton microscope image segmentation, liver CT, and prostate MRI. Gu proposed a method of using dilated convolution in the backbone network to retain contextual information. Vorontsov proposed a graph-to-graph network framework that converts images with ROI to images without ROI (for example, images with tumors are converted to healthy images without tumors), and then the tumors removed by the model are added to the new healthy images. , to obtain the detailed structure of the object. Zhou et al. proposed a method for skip connection rewiring of the U-Net network and performed it on nodule segmentation in chest low-dose CT scans, nuclear segmentation in microscopy images, liver segmentation in abdominal CT scans, and colonoscopy. Performance was tested on a polyp segmentation task in the examination video. Goyal applied DeepLabV3 to dermoscopic color image segmentation to extract skin lesion areas.
Nie proposed an attention model, which can segment the prostate more accurately than the baseline model (V-Net and FCN). SinHa proposed a network based on a multi-layer attention mechanism for abdominal organ segmentation in MRI images. Qin et al. proposed a dilated convolution module to preserve more details of 3D medical images. There are many other papers on blood image segmentation based on attention mechanisms.
Khosravan proposed an adversarial training network for pancreatic segmentation from CT scans. Son uses generative adversarial networks for retinal image segmentation. Xue uses a fully convolutional network as a segmentation network in a generative adversarial framework to segment brain tumors from MRI images. There are other papers that successfully apply GANs to medical image segmentation problems, so I won’t list them one by one.
Recurrent neural network (RNN) is mainly used to process sequence data. Long short-term memory network (LSTM) is an improved version of RNN. LSTM introduces self-loop (self-loops) enable the gradient flow to be maintained for a long time. In the field of medical image analysis, RNNs are used to model temporal dependencies in image sequences. Bin et al. proposed an image sequence segmentation algorithm that integrates a fully convolutional neural network and RNN, and incorporates information in the time dimension into the segmentation task. Gao et al. used CNN and LSTM to model temporal relationships in brain MRI slice sequences to improve segmentation performance in 4D images. Li et al. first used U-Net to obtain the initial segmentation probability map, and then used LSTM to segment the pancreas from 3D CT images, which improved the segmentation performance. There are many other papers that use RNN for medical image segmentation, so I will not introduce them one by one.
This part of the content is mainly about the application of segmentation algorithms in medical image segmentation, so there are not many innovation points. It is mainly about the application of different formats (CT or RGB, pixel range, image resolution, etc.) and the characteristics of data in different parts (noise, object shape, etc.), the classic network needs to be improved for different data to adapt to the input data format and characteristics, so that it can better complete the segmentation task. Although deep learning is a black box, the design of the overall model still has rules to follow. What strategies solve what problems and what problems they cause can be chosen based on the specific segmentation problem to achieve optimal segmentation performance.
1.Deep Semantic Segmentation of Natural and Medical Images: A Review
2.NAS-Unet: Neural architecture search for medical image segmentation. IEEE Access, 7:44247–44257, 2019.
3.Boosting segmentation with weak supervision from image-to-image translation. arXiv preprint arXiv: 1904.01636, 2019
4.Multi-scale guided attention for medical image segmentation. arXiv preprint arXiv:1906.02849,2019.
5.SegAN : Adversarial network with multi-scale L1 loss for medical image segmentation.
6.Fully convolutional structured LSTM networks for joint 4D medical image segmentation. In 2018 IEEE7 https://www.cnblogs .com/walter-xh/p/10051634.html
The above is the detailed content of Deep learning image segmentation: an overview of network structure design. For more information, please follow other related articles on the PHP Chinese website!