Home >Technology peripherals >AI >The traditional GAN can be interpreted after modification, and ensures the interpretability of the convolution kernel and the authenticity of the generated images.
Generative Adversarial Network ( GANs) have achieved great success in generating high-resolution images, and research on their interpretability has also attracted widespread attention in recent years.
In this field, how to make GAN learn a decoupled representation is still a major challenge. The so-called decoupled representation of GAN means that each part of the representation only affects specific aspects of the generated image. Previous research on decoupled representation of GANs focused on different perspectives.
For example, in Figure 1 below, Method 1 decouples the structure and style of the image. Method 2 learns the features of local objects in the image. Method 3 learns decoupled features of attributes in images, such as age attributes and gender attributes of face images. However, these studies failed to provide a clear and symbolic representation in GANs for different visual concepts (such as parts of the face such as eyes, nose, and mouth).
Figure 1: Visual comparison with other GAN decoupled representation methods
To this end, the researcher proposed a general method to modify traditional GAN into interpretable GAN, which ensures The convolution kernels in the middle layer of the generator can learn decoupled local visual concepts. Specifically, as shown in Figure 2 below, compared with traditional GAN, each convolution kernel in the middle layer of interpretable GAN always represents a specific visual concept when generating different images, and different convolution kernels represent different visions. concept.
##Figure 2: Visual comparison of interpretable GAN and traditional GAN encoding representation
Modeling methodThe learning of interpretable GAN should meet the following two goals: Interpretability of the convolution kerneland Authenticity of generated images.
In order to ensure the interpretability of the convolution kernel in the target layer, the researchers noticed that when multiple convolution kernels generate similar areas corresponding to a certain visual concept, They often jointly represent this visual concept.
Therefore, they use a set of convolution kernels to jointly represent a specific visual concept, and use different sets of convolution kernels to represent different visual concepts respectively.
In order to ensure the authenticity of the generated images at the same time, the researchers designed the following loss function to modify the traditional GAN into an interpretable GAN.
In the experiment, the researchers evaluated their interpretable GAN qualitatively and quantitatively.
Forqualitative analysis, they visualized the feature map of each convolution kernel to evaluate the performance of the convolution kernel on different images. Consistency of visual concepts represented. As shown in Figure 3 below, in interpretable GAN, each convolution kernel always generates image areas corresponding to the same visual concept when generating different images, while different convolution kernels generate image areas corresponding to different visual concepts.
Figure 3: Visualization of feature maps in interpretable GAN
In the experiment, the difference between the group center of each group of convolution kernels and the receptive fields between the convolution kernels was also compared, as shown in Figure 4(a) below. Figure 4(b) shows the proportion of the number of convolution kernels corresponding to different visual concepts in interpretable GAN. Figure 4(c) shows that when the number of convolution kernel groups selected for division is different, the more groups, the more detailed the visual concepts learned by the interpretable GAN.
Figure 4: Qualitative evaluation of interpretable GAN
Interpretable GAN alsosupports modifying specific visual concepts on the generated image. For example, the interaction of specific visual concepts between images can be achieved by exchanging the corresponding feature maps in the interpretable layer, that is, local/global face swapping is completed.
Figure 5 below gives the results of swapping mouth, hair and nose between pairs of images. The last column gives the difference between the modified image and the original image. This result shows that the researchers' method only modified the local visual concept without changing other irrelevant areas.
Figure 5: Exchanging specific visual concepts to generate images
In addition, Figure 6 below also shows the effect of their method when exchanging the entire face .
Figure 6: Swap the entire face of the generated image
ForQuantitative analysis , researchers used face verification experiments to evaluate the accuracy of face exchange results. Specifically, given a pair of face images, the face of the original image is replaced with the face of the source image to generate a modified image. Then, test whether the face in the modified image and the face in the source image have the same identity.
Table 1 below shows the accuracy of face verification results## of different methods. Their methods are It is better than other face swapping methods in terms of identity preservation.
Table 1: Accuracy evaluation of face-swapping identity
In addition, the locality of the method in modifying specific visual concepts is also evaluated in the experiment. Specifically, the researchers calculated the mean square error (MSE) between the original image and the modified image in RGB space, and used the ratio of the out-of-region MSE and the in-region MSE of a specific visual concept as an experimental index for locality evaluation. .
The results are shown in Table 2 below. The researcher’s modification method has better locality, that is Areas of the image outside of the modified visual concept changed less.
Table 2: Locality evaluation of modified visual concepts
For more experimental results, please see the paper.
This work proposes a general method that can modify traditional GANs into interpretable GANs without any manual annotation of visual concepts. In an interpretable GAN, each convolution kernel in the middle layer of the generator can stably generate the same visual concept when generating different images.
Experiments show that interpretable GAN also enables people to modify specific visual concepts on the generated images, providing a new perspective on the controllable editing method of GAN-generated images.
The above is the detailed content of The traditional GAN can be interpreted after modification, and ensures the interpretability of the convolution kernel and the authenticity of the generated images.. For more information, please follow other related articles on the PHP Chinese website!