search
HomeTechnology peripheralsAIAI video explodes again! Photo + voice turned into video, Alibaba asked the heroine Sora to sing and rap with Li Zi.

After Sora, there is actually a new AI video model, which is so amazing that everyone likes it and praises it!

AI video explodes again! Photo + voice turned into video, Alibaba asked the heroine Sora to sing and rap with Li Zi.Pictures

With it, the villain of "Kuronics" Gao Qiqiang transforms into Luo Xiang, and he can educate everyone (dog head).

AI video explodes again! Photo + voice turned into video, Alibaba asked the heroine Sora to sing and rap with Li Zi.

This is Alibaba’s latest audio-driven portrait video generation framework, EMO (Emote Portrait Alive).

With it, you can generate an AI video with vivid expressions by inputting a single reference image and a piece of audio (speech, singing, or rap can be used). The final length of the video depends on the length of the input audio.

You can ask Mona Lisa, a veteran contestant of AI effects experience, to recite a monologue:

AI video explodes again! Photo + voice turned into video, Alibaba asked the heroine Sora to sing and rap with Li Zi.

Here comes the young and handsome little plum. During this fast-paced RAP talent show, there was no problem keeping up with the mouth shape:

AI video explodes again! Photo + voice turned into video, Alibaba asked the heroine Sora to sing and rap with Li Zi.

Even the Cantonese lip-syncs could be held, which allowed my brother Leslie Cheung to sing Eason Chan's " Unconditional》:

AI video explodes again! Photo + voice turned into video, Alibaba asked the heroine Sora to sing and rap with Li Zi.

#In short, whether it is to make the portrait sing (different styles of portraits and songs), to make the portrait speak (different languages), or to make all kinds of "pretentious" The cross-actor performance and the EMO effect left us stunned for a moment.

Netizens lamented: "We are entering a new reality!"

AI video explodes again! Photo + voice turned into video, Alibaba asked the heroine Sora to sing and rap with Li Zi.The 2019 version of "Joker" said the lines of the 2008 version of "The Dark Knight"

Some netizens have even started to pull videos of EMO generated videos and analyze the effect frame by frame.

As shown in the video below, the protagonist is the AI ​​lady generated by Sora. The song she sang for you this time is "Don’t Start Now".

Commenters analyzed:

The consistency of this video is even better than before!
In the more than one minute video, the sunglasses on Ms. Sora’s face barely moved, and her ears and eyebrows moved independently.
The most exciting thing is that Ms. Sora’s throat seems to be really breathing! Her body trembled and moved slightly while singing, which shocked me!

AI video explodes again! Photo + voice turned into video, Alibaba asked the heroine Sora to sing and rap with Li Zi.##Picture

Having said that, EMO is a hot new technology, so it is inevitable to compare it with similar products——

Yesterday, the AI ​​video generation company Pika also launched a lip synchronization function for dubbing video characters and "lip syncing" at the same time, which was a big hit.

What is the specific effect? ​​We will put it here directly

AI video explodes again! Photo + voice turned into video, Alibaba asked the heroine Sora to sing and rap with Li Zi.

After comparison, netizens in the comment area came to the conclusion that they were beaten by Ali.

AI video explodes again! Photo + voice turned into video, Alibaba asked the heroine Sora to sing and rap with Li Zi.Picture

EMO released the paper and announced it was open source.

but! Although it is open source, there are still short positions on GitHub.

But again! Although it is a short position, the number of stars has exceeded 2.1k.

AI video explodes again! Photo + voice turned into video, Alibaba asked the heroine Sora to sing and rap with Li Zi.Picture

It made netizens really anxious, as anxious as King Jiji.

AI video explodes again! Photo + voice turned into video, Alibaba asked the heroine Sora to sing and rap with Li Zi.

Different architecture from Sora

As soon as the EMO paper came out, many people in the circle breathed a sigh of relief.

It is different from Sora’s technical route, which shows that copying Sora is not the only way.

EMO is not based on a DiT-like architecture, that is, Transformer is not used to replace traditional UNet. Its backbone network is modified from Stable Diffusion 1.5.

Specifically, EMO is an expressive audio-driven portrait video generation framework that can generate videos of any duration based on the length of the input video.

AI video explodes again! Photo + voice turned into video, Alibaba asked the heroine Sora to sing and rap with Li Zi.Picture

The framework mainly consists of two stages:

  • Frame encoding stage

Deploy a UNet network called ReferenceNet, which is responsible for extracting features from reference images and frames of videos.

  • Diffusion stage

First, the pre-trained audio encoder processes the audio embedding, and the face region mask is combined with multi-frame noise to control the generation of the face image .

The backbone network then leads the denoising operation. Two types of attention are applied in the backbone network, reference attention and audio attention, which serve to maintain the identity consistency of the character and regulate the movement of the character respectively.

Additionally, the time module is used to manipulate the time dimension and adjust the speed of movement.

In terms of training data, the team built a large and diverse audio and video data set containing more than 250 hours of video and more than 15 million images.

The specific features of the final implementation are as follows:

  • Videos of any duration can be generated based on the input audio while ensuring character identity consistency (the longest single video given in the demonstration is 1 minute 49 seconds).
  • Supports talking and singing in various languages ​​(the demo includes Mandarin, Cantonese, English, Japanese, Korean)
  • Supports different painting styles (photos, traditional paintings, comics, 3D rendering, AI digital person)

AI video explodes again! Photo + voice turned into video, Alibaba asked the heroine Sora to sing and rap with Li Zi.Picture

The quantitative comparison is also greatly improved compared to the previous method to obtain SOTA, only measuring mouth shape The SyncNet indicator of synchronization quality is slightly inferior.

AI video explodes again! Photo + voice turned into video, Alibaba asked the heroine Sora to sing and rap with Li Zi.Picture

Compared with other methods that do not rely on diffusion models, EMO is more time-consuming.

And since no explicit control signals are used, which may lead to the inadvertent generation of other body parts such as hands, a potential solution is to use control signals specifically for body parts.

EMO’s team

Finally, let’s take a look at the people on the team behind EMO.

The paper shows that the EMO team comes from Alibaba Intelligent Computing Research Institute.

There are four authors, namely Linrui Tian, ​​Qi Wang, Bang Zhang and Liefeng Bo.

AI video explodes again! Photo + voice turned into video, Alibaba asked the heroine Sora to sing and rap with Li Zi.Picture

Among them, Liefeng Bo is the current head of the XR laboratory of Alibaba Tongyi Laboratory.

Dr. Bo Liefeng graduated from Xi'an University of Electronic Science and Technology. He has engaged in postdoctoral research at Toyota Research Institute of the University of Chicago and the University of Washington. His research directions are mainly ML, CV and robotics. Its Google Scholar citations exceed 13,000.

Before joining Alibaba, he first served as chief scientist at Amazon’s Seattle headquarters, and then joined JD Digital Technology Group’s AI laboratory as chief scientist.

In September 2022, Bo Liefeng joined Alibaba.

AI video explodes again! Photo + voice turned into video, Alibaba asked the heroine Sora to sing and rap with Li Zi.Picture

EMO is not the first time Alibaba has achieved success in the AIGC field.

AI video explodes again! Photo + voice turned into video, Alibaba asked the heroine Sora to sing and rap with Li Zi.Picture

OutfitAnyone with AI one-click dress-up.

AI video explodes again! Photo + voice turned into video, Alibaba asked the heroine Sora to sing and rap with Li Zi.Pictures

There is also AnimateAnyone, which makes cats and dogs all over the world dance the bath dance.

This is the one below:

AI video explodes again! Photo + voice turned into video, Alibaba asked the heroine Sora to sing and rap with Li Zi.Picture

Now that EMO is launched, many netizens are lamenting that Alibaba has accumulated some technology on it .

AI video explodes again! Photo + voice turned into video, Alibaba asked the heroine Sora to sing and rap with Li Zi.Picture

If all these technologies are combined now, the effect will be...

I don’t dare to think about it, but I’m really looking forward to it.

AI video explodes again! Photo + voice turned into video, Alibaba asked the heroine Sora to sing and rap with Li Zi.Picture

In short, we are getting closer and closer to "send a script to AI and output the entire movie".

AI video explodes again! Photo + voice turned into video, Alibaba asked the heroine Sora to sing and rap with Li Zi.Picture

One More Thing

Sora represents a cliff-edge breakthrough in text-driven video synthesis.

EMO also represents a new level of audio-driven video synthesis.

Although the two tasks are different and the specific architecture is different, they still have one important thing in common:

There is no explicit physical model in the middle, but they both simulate physical laws to a certain extent. .

Therefore, some people believe that this is contrary to Lecun's insistence that "modeling the world for actions by generating pixels is wasteful and doomed to failure" and supports Jim Fan's "data-driven world model" idea. .

AI video explodes again! Photo + voice turned into video, Alibaba asked the heroine Sora to sing and rap with Li Zi.Picture

Various methods have failed in the past, but the current success may really come from "Bitter Lessons" by Sutton, the father of reinforcement learning. Vigorously miracle.

Enabling AI to discover like people, rather than containing what people discover

Breakthrough progress is ultimately achieved by scaling up computing

Paper: https://www.php.cn/link/a717f41c203cb970f96f706e4b12617bGitHub:https://www.php.cn/link/e43a09ffc30b44cb1f0db46f87836f40

Reference Link:
[1]https://www.php.cn/link/0dd4f2526c7c874d06f19523264f6552

##

The above is the detailed content of AI video explodes again! Photo + voice turned into video, Alibaba asked the heroine Sora to sing and rap with Li Zi.. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
ai合并图层的快捷键是什么ai合并图层的快捷键是什么Jan 07, 2021 am 10:59 AM

ai合并图层的快捷键是“Ctrl+Shift+E”,它的作用是把目前所有处在显示状态的图层合并,在隐藏状态的图层则不作变动。也可以选中要合并的图层,在菜单栏中依次点击“窗口”-“路径查找器”,点击“合并”按钮。

ai橡皮擦擦不掉东西怎么办ai橡皮擦擦不掉东西怎么办Jan 13, 2021 am 10:23 AM

ai橡皮擦擦不掉东西是因为AI是矢量图软件,用橡皮擦不能擦位图的,其解决办法就是用蒙板工具以及钢笔勾好路径再建立蒙板即可实现擦掉东西。

谷歌超强AI超算碾压英伟达A100!TPU v4性能提升10倍,细节首次公开谷歌超强AI超算碾压英伟达A100!TPU v4性能提升10倍,细节首次公开Apr 07, 2023 pm 02:54 PM

虽然谷歌早在2020年,就在自家的数据中心上部署了当时最强的AI芯片——TPU v4。但直到今年的4月4日,谷歌才首次公布了这台AI超算的技术细节。论文地址:https://arxiv.org/abs/2304.01433相比于TPU v3,TPU v4的性能要高出2.1倍,而在整合4096个芯片之后,超算的性能更是提升了10倍。另外,谷歌还声称,自家芯片要比英伟达A100更快、更节能。与A100对打,速度快1.7倍论文中,谷歌表示,对于规模相当的系统,TPU v4可以提供比英伟达A100强1.

ai可以转成psd格式吗ai可以转成psd格式吗Feb 22, 2023 pm 05:56 PM

ai可以转成psd格式。转换方法:1、打开Adobe Illustrator软件,依次点击顶部菜单栏的“文件”-“打开”,选择所需的ai文件;2、点击右侧功能面板中的“图层”,点击三杠图标,在弹出的选项中选择“释放到图层(顺序)”;3、依次点击顶部菜单栏的“文件”-“导出”-“导出为”;4、在弹出的“导出”对话框中,将“保存类型”设置为“PSD格式”,点击“导出”即可;

ai顶部属性栏不见了怎么办ai顶部属性栏不见了怎么办Feb 22, 2023 pm 05:27 PM

ai顶部属性栏不见了的解决办法:1、开启Ai新建画布,进入绘图页面;2、在Ai顶部菜单栏中点击“窗口”;3、在系统弹出的窗口菜单页面中点击“控制”,然后开启“控制”窗口即可显示出属性栏。

GPT-4的研究路径没有前途?Yann LeCun给自回归判了死刑GPT-4的研究路径没有前途?Yann LeCun给自回归判了死刑Apr 04, 2023 am 11:55 AM

Yann LeCun 这个观点的确有些大胆。 「从现在起 5 年内,没有哪个头脑正常的人会使用自回归模型。」最近,图灵奖得主 Yann LeCun 给一场辩论做了个特别的开场。而他口中的自回归,正是当前爆红的 GPT 家族模型所依赖的学习范式。当然,被 Yann LeCun 指出问题的不只是自回归模型。在他看来,当前整个的机器学习领域都面临巨大挑战。这场辩论的主题为「Do large language models need sensory grounding for meaning and u

强化学习再登Nature封面,自动驾驶安全验证新范式大幅减少测试里程强化学习再登Nature封面,自动驾驶安全验证新范式大幅减少测试里程Mar 31, 2023 pm 10:38 PM

引入密集强化学习,用 AI 验证 AI。 自动驾驶汽车 (AV) 技术的快速发展,使得我们正处于交通革命的风口浪尖,其规模是自一个世纪前汽车问世以来从未见过的。自动驾驶技术具有显着提高交通安全性、机动性和可持续性的潜力,因此引起了工业界、政府机构、专业组织和学术机构的共同关注。过去 20 年里,自动驾驶汽车的发展取得了长足的进步,尤其是随着深度学习的出现更是如此。到 2015 年,开始有公司宣布他们将在 2020 之前量产 AV。不过到目前为止,并且没有 level 4 级别的 AV 可以在市场

ai移动不了东西了怎么办ai移动不了东西了怎么办Mar 07, 2023 am 10:03 AM

ai移动不了东西的解决办法:1、打开ai软件,打开空白文档;2、选择矩形工具,在文档中绘制矩形;3、点击选择工具,移动文档中的矩形;4、点击图层按钮,弹出图层面板对话框,解锁图层;5、点击选择工具,移动矩形即可。

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Hot Tools

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

WebStorm Mac version

WebStorm Mac version

Useful JavaScript development tools

SublimeText3 Linux new version

SublimeText3 Linux new version

SublimeText3 Linux latest version