


After Sora, there is actually a new AI video model, which is so amazing that everyone likes it and praises it!
Pictures
With it, the villain of "Kuronics" Gao Qiqiang transforms into Luo Xiang, and he can educate everyone (dog head).
This is Alibaba’s latest audio-driven portrait video generation framework, EMO (Emote Portrait Alive).
With it, you can generate an AI video with vivid expressions by inputting a single reference image and a piece of audio (speech, singing, or rap can be used). The final length of the video depends on the length of the input audio.
You can ask Mona Lisa, a veteran contestant of AI effects experience, to recite a monologue:
Here comes the young and handsome little plum. During this fast-paced RAP talent show, there was no problem keeping up with the mouth shape:
Even the Cantonese lip-syncs could be held, which allowed my brother Leslie Cheung to sing Eason Chan's " Unconditional》:
#In short, whether it is to make the portrait sing (different styles of portraits and songs), to make the portrait speak (different languages), or to make all kinds of "pretentious" The cross-actor performance and the EMO effect left us stunned for a moment.
Netizens lamented: "We are entering a new reality!"
The 2019 version of "Joker" said the lines of the 2008 version of "The Dark Knight"
Some netizens have even started to pull videos of EMO generated videos and analyze the effect frame by frame.
As shown in the video below, the protagonist is the AI lady generated by Sora. The song she sang for you this time is "Don’t Start Now".
Commenters analyzed:
The consistency of this video is even better than before!
In the more than one minute video, the sunglasses on Ms. Sora’s face barely moved, and her ears and eyebrows moved independently.
The most exciting thing is that Ms. Sora’s throat seems to be really breathing! Her body trembled and moved slightly while singing, which shocked me!
##Picture
Picture
Picture
EMO is not based on a DiT-like architecture, that is, Transformer is not used to replace traditional UNet. Its backbone network is modified from Stable Diffusion 1.5.
Specifically, EMO is an expressive audio-driven portrait video generation framework that can generate videos of any duration based on the length of the input video.
Picture
The framework mainly consists of two stages:
- Frame encoding stage
Deploy a UNet network called ReferenceNet, which is responsible for extracting features from reference images and frames of videos.
- Diffusion stage
First, the pre-trained audio encoder processes the audio embedding, and the face region mask is combined with multi-frame noise to control the generation of the face image .
The backbone network then leads the denoising operation. Two types of attention are applied in the backbone network, reference attention and audio attention, which serve to maintain the identity consistency of the character and regulate the movement of the character respectively.
Additionally, the time module is used to manipulate the time dimension and adjust the speed of movement.
In terms of training data, the team built a large and diverse audio and video data set containing more than 250 hours of video and more than 15 million images.
The specific features of the final implementation are as follows:
- Videos of any duration can be generated based on the input audio while ensuring character identity consistency (the longest single video given in the demonstration is 1 minute 49 seconds).
- Supports talking and singing in various languages (the demo includes Mandarin, Cantonese, English, Japanese, Korean)
- Supports different painting styles (photos, traditional paintings, comics, 3D rendering, AI digital person)
Picture
The quantitative comparison is also greatly improved compared to the previous method to obtain SOTA, only measuring mouth shape The SyncNet indicator of synchronization quality is slightly inferior.
Picture
Compared with other methods that do not rely on diffusion models, EMO is more time-consuming.
And since no explicit control signals are used, which may lead to the inadvertent generation of other body parts such as hands, a potential solution is to use control signals specifically for body parts.
EMO’s team
Finally, let’s take a look at the people on the team behind EMO.
The paper shows that the EMO team comes from Alibaba Intelligent Computing Research Institute.
There are four authors, namely Linrui Tian, Qi Wang, Bang Zhang and Liefeng Bo.
Picture
Among them, Liefeng Bo is the current head of the XR laboratory of Alibaba Tongyi Laboratory.
Dr. Bo Liefeng graduated from Xi'an University of Electronic Science and Technology. He has engaged in postdoctoral research at Toyota Research Institute of the University of Chicago and the University of Washington. His research directions are mainly ML, CV and robotics. Its Google Scholar citations exceed 13,000.
Before joining Alibaba, he first served as chief scientist at Amazon’s Seattle headquarters, and then joined JD Digital Technology Group’s AI laboratory as chief scientist.
In September 2022, Bo Liefeng joined Alibaba.
Picture
EMO is not the first time Alibaba has achieved success in the AIGC field.
Picture
OutfitAnyone with AI one-click dress-up.
Pictures
There is also AnimateAnyone, which makes cats and dogs all over the world dance the bath dance.
This is the one below:
Picture
Now that EMO is launched, many netizens are lamenting that Alibaba has accumulated some technology on it .
Picture
If all these technologies are combined now, the effect will be...
I don’t dare to think about it, but I’m really looking forward to it.
Picture
In short, we are getting closer and closer to "send a script to AI and output the entire movie".
Picture
One More Thing
Sora represents a cliff-edge breakthrough in text-driven video synthesis.
EMO also represents a new level of audio-driven video synthesis.
Although the two tasks are different and the specific architecture is different, they still have one important thing in common:
There is no explicit physical model in the middle, but they both simulate physical laws to a certain extent. .
Therefore, some people believe that this is contrary to Lecun's insistence that "modeling the world for actions by generating pixels is wasteful and doomed to failure" and supports Jim Fan's "data-driven world model" idea. .
Picture
Various methods have failed in the past, but the current success may really come from "Bitter Lessons" by Sutton, the father of reinforcement learning. Vigorously miracle.
Enabling AI to discover like people, rather than containing what people discover
Breakthrough progress is ultimately achieved by scaling up computing
Paper: https://www.php.cn/link/a717f41c203cb970f96f706e4b12617bGitHub:https://www.php.cn/link/e43a09ffc30b44cb1f0db46f87836f40
Reference Link:
[1]https://www.php.cn/link/0dd4f2526c7c874d06f19523264f6552
The above is the detailed content of AI video explodes again! Photo + voice turned into video, Alibaba asked the heroine Sora to sing and rap with Li Zi.. For more information, please follow other related articles on the PHP Chinese website!

ai合并图层的快捷键是“Ctrl+Shift+E”,它的作用是把目前所有处在显示状态的图层合并,在隐藏状态的图层则不作变动。也可以选中要合并的图层,在菜单栏中依次点击“窗口”-“路径查找器”,点击“合并”按钮。

ai橡皮擦擦不掉东西是因为AI是矢量图软件,用橡皮擦不能擦位图的,其解决办法就是用蒙板工具以及钢笔勾好路径再建立蒙板即可实现擦掉东西。

虽然谷歌早在2020年,就在自家的数据中心上部署了当时最强的AI芯片——TPU v4。但直到今年的4月4日,谷歌才首次公布了这台AI超算的技术细节。论文地址:https://arxiv.org/abs/2304.01433相比于TPU v3,TPU v4的性能要高出2.1倍,而在整合4096个芯片之后,超算的性能更是提升了10倍。另外,谷歌还声称,自家芯片要比英伟达A100更快、更节能。与A100对打,速度快1.7倍论文中,谷歌表示,对于规模相当的系统,TPU v4可以提供比英伟达A100强1.

ai可以转成psd格式。转换方法:1、打开Adobe Illustrator软件,依次点击顶部菜单栏的“文件”-“打开”,选择所需的ai文件;2、点击右侧功能面板中的“图层”,点击三杠图标,在弹出的选项中选择“释放到图层(顺序)”;3、依次点击顶部菜单栏的“文件”-“导出”-“导出为”;4、在弹出的“导出”对话框中,将“保存类型”设置为“PSD格式”,点击“导出”即可;

ai顶部属性栏不见了的解决办法:1、开启Ai新建画布,进入绘图页面;2、在Ai顶部菜单栏中点击“窗口”;3、在系统弹出的窗口菜单页面中点击“控制”,然后开启“控制”窗口即可显示出属性栏。

Yann LeCun 这个观点的确有些大胆。 「从现在起 5 年内,没有哪个头脑正常的人会使用自回归模型。」最近,图灵奖得主 Yann LeCun 给一场辩论做了个特别的开场。而他口中的自回归,正是当前爆红的 GPT 家族模型所依赖的学习范式。当然,被 Yann LeCun 指出问题的不只是自回归模型。在他看来,当前整个的机器学习领域都面临巨大挑战。这场辩论的主题为「Do large language models need sensory grounding for meaning and u

引入密集强化学习,用 AI 验证 AI。 自动驾驶汽车 (AV) 技术的快速发展,使得我们正处于交通革命的风口浪尖,其规模是自一个世纪前汽车问世以来从未见过的。自动驾驶技术具有显着提高交通安全性、机动性和可持续性的潜力,因此引起了工业界、政府机构、专业组织和学术机构的共同关注。过去 20 年里,自动驾驶汽车的发展取得了长足的进步,尤其是随着深度学习的出现更是如此。到 2015 年,开始有公司宣布他们将在 2020 之前量产 AV。不过到目前为止,并且没有 level 4 级别的 AV 可以在市场

ai移动不了东西的解决办法:1、打开ai软件,打开空白文档;2、选择矩形工具,在文档中绘制矩形;3、点击选择工具,移动文档中的矩形;4、点击图层按钮,弹出图层面板对话框,解锁图层;5、点击选择工具,移动矩形即可。


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

WebStorm Mac version
Useful JavaScript development tools

SublimeText3 Linux new version
SublimeText3 Linux latest version
