


Image restoration refers to completing the missing areas of the image, which is one of the basic tasks of computer vision. This direction has many practical applications, such as object removal, image retargeting, image synthesis, etc.
Early inpainting methods were based on image block synthesis or color diffusion to fill in missing parts of the image. To accomplish more complex image structures, researchers are turning to data-driven approaches, where they utilize deep generative networks to predict visual content and appearance. By training on large sets of images and aided by reconstruction and adversarial losses, generative inpainting models have been shown to produce more visually appealing results on various types of input data, including natural images and human faces.
However, existing works can only show good results in completing simple image structures, and generating image content with complex overall structures and high-detail fidelity is still a huge challenge, Especially when the image holes are large.
Essentially, image inpainting faces two key issues: one is how to accurately propagate global context to incomplete regions, and the other is to synthesize real local parts that are consistent with global cues. detail. To solve the global context propagation problem, existing networks utilize encoder-decoder structures, atrous convolutions, contextual attention, or Fourier convolutions to integrate long-range feature dependencies and expand the effective receptive field. Furthermore, the two-stage approach and iterative hole filling rely on predicting coarse results to enhance the global structure. However, these models lack a mechanism to capture high-level semantics of unmasked regions and effectively propagate them into holes to synthesize an overall global structure.
Based on this, researchers from the University of Rochester and Adobe Research proposed a new generation network: CM-GAN (cascaded modulation GAN), which can be better Geographically synthesize the overall structure and local details. CM-GAN includes an encoder with Fourier convolution blocks to extract multi-scale feature representations from input images with holes. There is also a two-stream decoder in CM-GAN, which sets a novel cascaded global spatial modulation block at each scale layer.
In each decoder block, we first apply global modulation to perform coarse and semantically aware structure synthesis, and then perform spatial modulation to further adjust the feature map in a spatially adaptive manner. . In addition, this study designed an object perception training scheme to prevent artifacts within the cavity to meet the needs of object removal tasks in real-life scenes. The study conducted extensive experiments to show that CM-GAN significantly outperforms existing methods in both quantitative and qualitative evaluations.
- Paper address: https://arxiv.org/pdf/2203.11947.pdf
- Project address: https://github.com/htzheng/CM-GAN-Inpainting
Let’s first look at the image repair effect. Compared with other methods, CM-GAN can reconstruct better textures:
##CM-GAN has better object boundaries:
Let’s take a look at the research methods and experimental results.
Method
Cascade Modulation GAN
In order to better model the global context of image completion, this study proposes a new mechanism that cascades global code modulation and spatial code modulation. This mechanism helps to deal with partially invalid features while better injecting global context into the spatial domain. The new architecture CM-GAN can well synthesize the overall structure and local details, as shown in Figure 1 below.
As shown in Figure 2 (left) below, CM-GAN is based on one encoder branch and two parallel cascades Decoder branch to generate visual output. The encoder takes part of the image and mask as input and generates multi-scale feature maps.
Different from most encoder-decoder methods, in order to complete the overall structure, this study extracts global style codes from the highest-level features of the fully connected layer, and then
Normalized. Additionally, an MLP-based mapping network generates style codes w from noise to simulate the randomness of image generation. Codes w are combined with s to produce a global code g = [s; w], which is used in subsequent decoding steps.
Global spatial cascade modulation. To better connect the global context during the decoding stage, this study proposes global spatial cascaded modulation (CM). As shown in Figure 2 (right), the decoding stage is based on two branches: global modulation block (GB) and spatial modulation block (SB), and upsamples global features F_g and local features F_s in parallel.
Unlike existing methods, CM-GAN introduces a new method of injecting global context into the hole region. At a conceptual level, it consists of cascaded global and spatial modulations between features at each scale, and naturally integrates three compensation mechanisms for global context modeling: 1) feature upsampling; 2) global modulation; 3 ) spatial modulation.
Object Perception Training
The algorithm that generates masks for training is crucial. Essentially, the sampled mask should be similar to the mask that would be drawn in the actual use case, and the mask should avoid covering the entire object or large parts of any new objects. Oversimplified masking schemes can lead to artifacts.
To better support real object removal use cases while preventing the model from synthesizing new objects within holes, this study proposes an object-aware training scheme that generates A more realistic mask, as shown in Figure 4 below.
Specifically, the study first passes the training images to the panoramic segmentation network PanopticFCN to generate highly accurate instance-level The annotations are segmented, then a mixture of free holes and object holes is sampled as an initial mask, and finally the overlap ratio between the hole and each instance in the image is calculated. If the overlap ratio is greater than the threshold, the method excludes the foreground instance from the hole; otherwise, the hole is left unchanged and the simulated object is completed with the threshold set to 0.5. The study randomly expands and translates object masks to avoid overfitting. Additionally, this study enlarges holes on instance segmentation boundaries to avoid leaking background pixels near holes into the inpainted region.
Training objective with Masked-R_1 regularization
The model is trained with a combination of adversarial loss and segmentation-based perceptual loss. Experiments show that this method can also achieve good results when purely using adversarial losses, but adding perceptual losses can further improve performance.
In addition, this study also proposes a masked-R_1 regularization specifically for adversarial training of stable inpainting tasks, where the mask m is utilized to avoid computing the gradient penalty outside the mask.
Experiment
This study conducted an image repair experiment on the Places2 data set at a resolution of 512 × 512, and gave the model's Quantitative and qualitative assessment results.
Quantitative evaluation: Table 1 below compares CM-GAN with other masking methods. The results show that CM-GAN significantly outperforms other methods in terms of FID, LPIPS, U-IDS, and P-IDS. With the help of perceptual loss, LaMa, CM-GAN achieves significantly better LPIPS scores than CoModGAN and other methods, thanks to the additional semantic guidance provided by the pre-trained perceptual model. Compared to LaMa/CoModGAN, CM-GAN reduces FID from 3.864/3.724 to 1.628.
As shown in Table 3 below, with or without fine-tuning, CM-GAN has better performance in LaMa and CoModGAN masks Both have achieved significantly better performance gains than LaMa and CoModGAN, indicating that the model has generalization capabilities. It is worth noting that the performance of CM-GAN trained on CoModGAN mask, object-aware mask is still better than that of CoModGAN mask, confirming that CM-GAN has better generation ability.
#To verify the importance of each component in the model, this study conducted a set of ablation experiments, and all models were trained and evaluated on the Places2 dataset. The results of the ablation experiment are shown in Table 2 and Figure 7 below.
##
The above is the detailed content of Even if a large area of the image is missing, it can be restored realistically. The new model CM-GAN takes into account the global structure and texture details.. For more information, please follow other related articles on the PHP Chinese website!

<p>Windows 11 改进了系统中的个性化功能,这使用户可以查看之前所做的桌面背景更改的近期历史记录。当您进入windows系统设置应用程序中的个性化部分时,您可以看到各种选项,更改背景壁纸也是其中之一。但是现在可以看到您系统上设置的背景壁纸的最新历史。如果您不喜欢看到此内容并想清除或删除此最近的历史记录,请继续阅读这篇文章,它将帮助您详细了解如何使用注册表编辑器进行操作。</p><h2>如何使用注册表编辑

窗户从来不是一个忽视美学的人。从XP的田园绿场到Windows11的蓝色漩涡设计,默认桌面壁纸多年来一直是用户愉悦的源泉。借助WindowsSpotlight,您现在每天都可以直接访问锁屏和桌面壁纸的美丽、令人敬畏的图像。不幸的是,这些图像并没有闲逛。如果您爱上了Windows聚光灯图像之一,那么您将想知道如何下载它们,以便将它们作为背景保留一段时间。以下是您需要了解的所有信息。什么是WindowsSpotlight?窗口聚光灯是一个自动壁纸更新程序,可以从“设置”应用中的“个性化>

随着人工智能技术的不断发展,图像语义分割技术已经成为图像分析领域的热门研究方向。在图像语义分割中,我们将一张图像中的不同区域进行分割,并对每个区域进行分类,从而达到对这张图像的全面理解。Python是一种著名的编程语言,其强大的数据分析和数据可视化能力使其成为了人工智能技术研究领域的首选。本文将介绍如何在Python中使用图像语义分割技术。一、前置知识在深入

得益于 NeRF 提供的可微渲染,近期的三维生成模型已经在静止物体上达到了很惊艳的效果。但是在人体这种更加复杂且可形变的类别上,三维生成依旧有很大的挑战。本文提出了一个高效的组合的人体 NeRF 表达,实现了高分辨率(512x256)的三维人体生成,并且没有使用超分模型。EVA3D 在四个大型人体数据集上均大幅超越了已有方案,代码已开源。论文名称:EVA3D: Compositional 3D Human Generation from 2D image Collections论文地址:http

那些必须每天处理图像文件的人经常不得不调整它们的大小以适应他们的项目和工作的需求。但是,如果要处理的图像太多,则单独调整它们的大小会消耗大量时间和精力。在这种情况下,像PowerToys这样的工具可以派上用场,除其他外,可以使用其图像调整大小器实用程序批量调整图像文件的大小。以下是设置图像调整器设置并开始使用PowerToys批量调整图像大小的方法。如何使用PowerToys批量调整图像大小PowerToys是一个多合一的程序,具有各种实用程序和功能,可帮助您加快日常任务。它的实用程序之一是图像

新视角图像生成(NVS)是计算机视觉的一个应用领域,在1998年SuperBowl的比赛,CMU的RI曾展示过给定多摄像头立体视觉(MVS)的NVS,当时这个技术曾转让给美国一家体育电视台,但最终没有商业化;英国BBC广播公司为此做过研发投入,但是没有真正产品化。在基于图像渲染(IBR)领域,NVS应用有一个分支,即基于深度图像的渲染(DBIR)。另外,在2010年曾很火的3D TV,也是需要从单目视频中得到双目立体,但是由于技术的不成熟,最终没有流行起来。当时基于机器学习的方法已经开始研究,比

随着数字文化产业的蓬勃发展,人工智能技术开始广泛应用于图像编辑和美化领域。其中,人像美肤无疑是应用最广、需求最大的技术之一。传统美颜算法利用基于滤波的图像编辑技术,实现了自动化的磨皮去瑕疵效果,在社交、直播等场景取得了广泛的应用。然而,在门槛较高的专业摄影行业,由于对图像分辨率以及质量标准的较高要求,人工修图师还是作为人像美肤修图的主要生产力,完成包括匀肤、去瑕疵、美白等一系列工作。通常,一位专业修图师对一张高清人像进行美肤操作的平均处理时间为 1-2 分钟,在精度要求更高的广告、影视等领域,该

如何使用Python对图片进行图像去噪处理图像去噪是图像处理中的一项重要任务,它的目的是去除图像中的噪声,提高图像的质量和清晰度。Python是一种功能强大的编程语言,拥有丰富的图像处理库,如PIL、OpenCV等,可以帮助我们实现图像去噪的功能。本文将介绍如何使用Python对图片进行图像去噪处理,并给出相应的代码示例。导入所需的库首先,我们需要导入所需的


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

EditPlus Chinese cracked version
Small size, syntax highlighting, does not support code prompt function

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

ZendStudio 13.5.1 Mac
Powerful PHP integrated development environment

MantisBT
Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

SublimeText3 Chinese version
Chinese version, very easy to use
