search
HomeTechnology peripheralsAIDeepFake has never been so real! How strong is Nvidia's latest 'implicit distortion”?

In recent years, generation technology in the field of computer vision has become more and more powerful, and the corresponding "forgery" technology has become more and more mature. From DeepFake face-changing to action simulation, it is difficult to distinguish the real from the fake.

Recently, NVIDIA has made another big move, and published a new Implicit Warping (Implicit Warping) framework at the NeurIPS 2022 conference, using A set of source images and drive the movement of the video to make the target animation .

DeepFake has never been so real! How strong is Nvidias latest implicit distortion”?

## Paper link: https://arxiv.org/pdf/2210.01794.pdf

From the effect point of view, the generated images are more realistic. When the characters move in the video, the background will not change.

Multiple source images input usually provide different appearance information, reducing the generator's "fantasy" space , for example, the following two are used as model input.

It can be found that compared with other models, implicit distortion does not produce "space distortion" similar to the beauty effect.

Because of the occlusion of characters, multiple source images can also provide a more complete background.

DeepFake has never been so real! How strong is Nvidias latest implicit distortion”?

As you can see from the video below, if there is only one picture on the left, is the one behind the background "BD" or " ED" is difficult to guess, which will cause background distortion, and two pictures will generate a more stable image.

When comparing other models, only one source image is better.

DeepFake has never been so real! How strong is Nvidias latest implicit distortion”?Magical Implicit Distortion

The academic world’s focus on video imitation can be traced back to 2005, and many projects have real-time facial reproduction. Expression transmission, Face2Face, synthetic Obama, Recycle-GAN, ReenactGAN, dynamic neural radiation field, etc. diversified the use of several limited technologies at the time, such as generative adversarial networks (GAN), neural radiation fields (NeRF) and autoencoders.

Not all methods are trying to generate videos from a single frame of images. There are also some studies that perform complex calculations on each frame in the video. This is actually what Deepfake does. Take the imitation route.

However, since the DeepFake model obtains less information, this method requires training for each video clip, and the performance is reduced compared to the open source methods of DeepFaceLab or FaceSwap. This Both models are able to impose an identity onto any number of video clips.

The FOMM model released in 2019 allows characters to move with the video, giving the video imitation task another shot in the arm.

Other researchers subsequently attempted to derive multiple poses and expressions from a single face image or full-body representation; however, this approach generally only worked for relatively expressionless and immobile subjects. , such as a relatively stationary "talking head" because there are no "sudden changes in behavior" in facial expressions or gestures that the network has to interpret.

DeepFake has never been so real! How strong is Nvidias latest implicit distortion”?

Although some of these technologies and methods gained public attention before deepfakes and potential diffusion image synthesis methods became popular, their scope of applicability is limited. , versatility was questioned.

The implicit distortion that NVIDIA focuses on this time is to obtain information between multiple frames or even only between two frames, rather than obtaining all necessary poses from one frame. Information, this setup isn't present in other competing models, or is handled very poorly.

DeepFake has never been so real! How strong is Nvidias latest implicit distortion”?

For example, Disney's workflow is that senior animators draw the main frames and key frames, and other junior animators are responsible for drawing intermediate frames.

Through testing on previous versions, NVIDIA researchers found that the quality of results from the previous method deteriorated with additional "keyframes", and that the new method was inconsistent with the logic of animation production. Consistently, performance improves in a linear fashion as the number of keyframes increases.

If there are some sudden changes in the middle of the clip, such as an event or expression that is not shown in the start frame or end frame, implicit distortion can be added at this midpoint. One frame, additional information will be fed back to the attention mechanism of the entire clip.

DeepFake has never been so real! How strong is Nvidias latest implicit distortion”?

Model structure

Previous methods like FOMM, Monkey-Net and face-vid2vid etc. use explicit distortion to draw a Time series,The information extracted from source faces and control,movements must be adapted and consistent with this time,series.

Under this model design, the final mapping of key points is quite strict.

In contrast, Implicit Warp uses a cross-modal attention layer with fewer predefined bootstrapping in its workflow and can adapt to inputs from multiple frameworks.

The workflow also does not require distortion on a per-key basis, the system can select the most appropriate features from a series of images.

DeepFake has never been so real! How strong is Nvidias latest implicit distortion”?

Implicit warping also reuses some key point prediction components in the FOMM framework, and finally uses a simple U-net to derive the space Drive keypoint representation for encoding. A separate U-net is used to encode the source image together with the derived spatial representation. Both networks can operate at resolutions ranging from 64px (256px squared output) to 384x384px.

DeepFake has never been so real! How strong is Nvidias latest implicit distortion”?

Because this mechanism cannot automatically account for all possible changes in pose and movement in any given video, additional keyframes are necessary , can be added temporarily. Without this intervention capability, keys that are not similar enough to the target motion point will automatically update, resulting in a decrease in output quality.

The researchers’ explanation for this is that although it is the most similar key to the query in a given set of keyframes, it may not be enough to produce a good output.

For example, suppose the source image has a face with closed lips, and the driver image has a face with open lips and exposed teeth. In this case, there is no appropriate key (and value) in the source image to drive the mouth region of the image.

This method overcomes this problem by learning additional image-independent key-value pairs, which can cope with the lack of information in the source image.

Although the current implementation is quite fast, around 10 FPS on a 512x512px image, the researchers believe that in a future version the pipeline could be passed through a factorized I-D attention layer Or Spatial Reduction Attention (SRA) layer (i.e. Pyramid Vision Transformer) to optimize.

DeepFake has never been so real! How strong is Nvidias latest implicit distortion”?

Because implicit warping uses global attention instead of local attention, it can predict factors that previous models cannot predict.

Experimental results

The researchers tested the system on the VoxCeleb2 data set, the more challenging TED Talk data set and the TalkingHead-1KH data set, comparing Baseline between 256x256px and full 512x512px resolution, using metrics including FID, AlexNet-based LPIPS, and Peak Signal-to-Noise Ratio (pSNR).

The contrasting frameworks used for testing include FOMM and face-vid2vid, as well as AA-PCA. Since previous methods have little or no ability to use multiple keyframes, this is also the main innovation of implicit distortion, research The staff also designed similar testing methods.

DeepFake has never been so real! How strong is Nvidias latest implicit distortion”?

Implicit warping outperforms most contrasting methods on most metrics.

In the multi-keyframe reconstruction test, in which the researchers used sequences of up to 180 frames and selected gap frames, implicit warping won overall this time.

DeepFake has never been so real! How strong is Nvidias latest implicit distortion”?

As the number of source images increases, this method can achieve better reconstruction results, and the scores of all indicators improve.

And as the number of source images increases, the reconstruction effect of the previous work becomes worse, contrary to expectations.

DeepFake has never been so real! How strong is Nvidias latest implicit distortion”?

After conducting qualitative research through AMT staff, it is also believed that the generation results of implicit deformation are stronger than other methods.

DeepFake has never been so real! How strong is Nvidias latest implicit distortion”?

Having access to this framework would allow users to create more coherent and longer video simulations and full-body deepfake videos, all while Capable of exhibiting a much greater range of motion than any frame the system has been tested on.

But research into more realistic image synthesis also raises concerns because these techniques can be easily used for forgery, and there are standard disclaimers in papers.

If our method is used to create DeepFake products, it may have negative impacts. Malicious speech synthesis creates false images of people by transferring and transmitting false information across identities, leading to identity theft or the spread of false news. But in controlled settings, the same technology can also be used for entertainment purposes.

The paper also points out the potential of this system for neural video reconstruction, such as Google's Project Starline. In this framework, the reconstruction work is mainly focused on the client side, leveraging the sparse input from the person on the other end. Sports information.

This solution has attracted more and more interest from the research community, and some companies intend to implement low-bandwidth conference calls by sending pure motion data or sparsely spaced key frames. These key frames will Interpreted and inserted into full HD video upon reaching the target client.

The above is the detailed content of DeepFake has never been so real! How strong is Nvidia's latest 'implicit distortion”?. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
Gemma Scope: Google's Microscope for Peering into AI's Thought ProcessGemma Scope: Google's Microscope for Peering into AI's Thought ProcessApr 17, 2025 am 11:55 AM

Exploring the Inner Workings of Language Models with Gemma Scope Understanding the complexities of AI language models is a significant challenge. Google's release of Gemma Scope, a comprehensive toolkit, offers researchers a powerful way to delve in

Who Is a Business Intelligence Analyst and How To Become One?Who Is a Business Intelligence Analyst and How To Become One?Apr 17, 2025 am 11:44 AM

Unlocking Business Success: A Guide to Becoming a Business Intelligence Analyst Imagine transforming raw data into actionable insights that drive organizational growth. This is the power of a Business Intelligence (BI) Analyst – a crucial role in gu

How to Add a Column in SQL? - Analytics VidhyaHow to Add a Column in SQL? - Analytics VidhyaApr 17, 2025 am 11:43 AM

SQL's ALTER TABLE Statement: Dynamically Adding Columns to Your Database In data management, SQL's adaptability is crucial. Need to adjust your database structure on the fly? The ALTER TABLE statement is your solution. This guide details adding colu

Business Analyst vs. Data AnalystBusiness Analyst vs. Data AnalystApr 17, 2025 am 11:38 AM

Introduction Imagine a bustling office where two professionals collaborate on a critical project. The business analyst focuses on the company's objectives, identifying areas for improvement, and ensuring strategic alignment with market trends. Simu

What are COUNT and COUNTA in Excel? - Analytics VidhyaWhat are COUNT and COUNTA in Excel? - Analytics VidhyaApr 17, 2025 am 11:34 AM

Excel data counting and analysis: detailed explanation of COUNT and COUNTA functions Accurate data counting and analysis are critical in Excel, especially when working with large data sets. Excel provides a variety of functions to achieve this, with the COUNT and COUNTA functions being key tools for counting the number of cells under different conditions. Although both functions are used to count cells, their design targets are targeted at different data types. Let's dig into the specific details of COUNT and COUNTA functions, highlight their unique features and differences, and learn how to apply them in data analysis. Overview of key points Understand COUNT and COU

Chrome is Here With AI: Experiencing Something New Everyday!!Chrome is Here With AI: Experiencing Something New Everyday!!Apr 17, 2025 am 11:29 AM

Google Chrome's AI Revolution: A Personalized and Efficient Browsing Experience Artificial Intelligence (AI) is rapidly transforming our daily lives, and Google Chrome is leading the charge in the web browsing arena. This article explores the exciti

AI's Human Side: Wellbeing And The Quadruple Bottom LineAI's Human Side: Wellbeing And The Quadruple Bottom LineApr 17, 2025 am 11:28 AM

Reimagining Impact: The Quadruple Bottom Line For too long, the conversation has been dominated by a narrow view of AI’s impact, primarily focused on the bottom line of profit. However, a more holistic approach recognizes the interconnectedness of bu

5 Game-Changing Quantum Computing Use Cases You Should Know About5 Game-Changing Quantum Computing Use Cases You Should Know AboutApr 17, 2025 am 11:24 AM

Things are moving steadily towards that point. The investment pouring into quantum service providers and startups shows that industry understands its significance. And a growing number of real-world use cases are emerging to demonstrate its value out

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
1 months agoBy尊渡假赌尊渡假赌尊渡假赌
Will R.E.P.O. Have Crossplay?
1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

Dreamweaver Mac version

Dreamweaver Mac version

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)