首頁 >科技週邊 >人工智慧 >70萬人爭先體驗！影片生成新王者「可靈AI」又雙詠升級了

70萬人爭先體驗！影片生成新王者「可靈AI」又雙詠升級了

王林原創: 2024-07-20 05:09:40991瀏覽

難不成，AI 生成短劇時代真的要來了？

最近，各路影片產生 AI 放出的 Demo 讓人眼花撩亂。從玩梗圖、拼長度再到講究真實物理邏輯，層出不窮的人工智慧創意難分高下，個個都要跟 Sora 試比高。這時候，突然有人偷偷先行一步，搞出了「電影級」的表現：

從真實風格的光影效果：

70萬人爭先體驗！影片生成新王者「可靈AI」又雙詠升級了

^{83419661730197}

到豐富的想像力，要素齊全，都能搞定：

70萬人爭先體驗！影片生成新王者「可靈AI」又雙詠升級了

^{資料來源：https://x.com/blizaine/status/1806383419661730197}

有的人已經在嘗試使用這種能力來完成複雜的任務。有影片生成的 AI，音樂生成的 AI，再加上一些 PS 和 AE，我們就可以製作出完整的 MV 了。

70萬人爭先體驗！影片生成新王者「可靈AI」又雙詠升級了

你問網友們如何看待這個生成效果，網友要反問一句「好萊塢你怎麼看？」

這種AI 影片產生的效果絲滑且精細，吸引了一大波點贊，仔細翻看，社群網路上由它出品的短影片還有不少。 70萬人爭先體驗！影片生成新王者「可靈AI」又雙詠升級了

據網友總結道，新款 AI 的優勢主要體現在產生大幅度運動時不容易亂腦補。再來例如讓它圖生視頻，一個奔跑的半人馬：

70萬人爭先體驗！影片生成新王者「可靈AI」又雙詠升級了

時中下時中使用5092032096383909830983909639/3963996399639996399639999639996390393963996393996399039393939393939393939393933分

這些影片背後的生成式AI，是快手旗下的大模型「可靈AI」（Kling）

，幾個星期前它開始在全球網路上刷爆，那時就號稱「一號難求」。

沒錯，這不是先放出一些 Demo 搞 PPT 發布，而是上來就直接開放的產品級應用。 現在可靈 AI 已經上線了網頁版，主打一個簡單好用

。

最新數據，可靈 AI 的申請用戶數量已經接近 70 萬

，成了全網最熱的視頻生成大模型。

一月數次升級，可靈 AI 的狂飆式進化

今年是生成式 AI 元年，早在 2 月份，OpenAI 的 Sora 就把競爭拉到了視頻生成的高度。但率先落地的還數國內科技公司。

自 6 月 6 日正式亮相以來，才一個月的時間，快手可靈 AI 這一首個在海外 AI 圈引起熱議的國產大模型就經歷了三次迭代更新。

從最開始的文生視頻，到兩週後支持圖生視頻、視頻續寫、多尺寸選擇，可靈 AI 表現得越來越出色、全面。影片產生的各種需求，不知不覺中似乎都解決了。

就在上週末的世界人工智慧大會WAIC 2024 上，可靈AI 迎來第三次大的升級，發布了一系列新功能，在視頻生成質感、美感、可玩性方面大大提升，帶來了創作體驗上的另一個躍升。

快手高級副總裁、快手主站業務與社區科學線負責人

蓋坤

介紹了此次可靈AI 升級的三大亮點功能，包括

高畫質版、首尾幀控制和相機鏡頭控制。

70萬人爭先體驗！影片生成新王者「可靈AI」又雙詠升級了

蓋蓋

時設計清晰的款式與時尚基礎，基礎升級基礎。升級後，生成影片的畫質相較於先前模型有了質的飛躍。

同時得益於更高的訓練時空分辨率，可靈 AI 在生成細節、構圖、運鏡美觀性、光影方面都有很大改善。

從如下畫質的對比中，我們可以一目了然地看出可靈 AI 之前模型與最新模型之間的差異。

其次，可靈 AI 在圖生視訊領域增加了實用且呼聲很高的「首尾幀控制」功能，讓首尾幀呼應的圖生影片成為了現實。

透過自訂起始影格和結束影格影像，讓使用者精準控制不同影片片段之間鏡頭的絲滑轉場，實現一鏡到底等效果。從實際生成結果來看，不僅動作自然流暢，畫質也能得到保證。這項功能的引入讓用戶擁有了更直覺、更便利的編輯體驗，滿足了個人化的圖生影片需求。

例如將如下兩圖生成一段影片：

效果是這樣的：

最後，可靈 AI 增加了運鏡功能以及自動大師運鏡
最後，可靈 AI 增加了
運鏡功能以及自動大師運鏡。在影片的世界中，更多鏡頭的組合可以捕捉更多畫面，並增強整體表現力。

可靈AI 預設了六套經典的鏡頭控制方式，包括Roll 旋轉運鏡、Tilt 垂直搖鏡、Pan 水平搖鏡、Vertical 垂直運鏡、Horizontal 水平運鏡和Zoom 推進/ 拉遠，為不同場景提供了豐富的選擇。使用者還可以調節這些運鏡的正數、負數參數，從而控制運動的激烈或平緩程度以及反向運動等。同時，大師級運鏡有助於產出電影感十足的吸睛大片。

可以看到，隨著這些新功能的加入，可靈 AI 在視頻清晰度、美學表現以及內容自定義控制方面有了肉眼可見的改進。不僅如此，正式與用戶見面的
可靈 AI 網頁版集成了文生圖、文生視頻以及不久後將支持的視頻編輯能力
，成為發布即可用的一站式視覺內容創作平台。

其中新增的「首尾幀控制」和「運鏡控制」功能目前在網頁端提供，想要體驗的小夥伴可以速速去申請了！

可靈AI 網頁版地址：klingai.kuaishou.com

用「誠意滿滿」形容可靈AI 這次的升級不為過，背後當然離不開快手在視頻生成能力和技術上的持續創新突破。
「電影級」AI 生成，背後全是技術

相較於已經非常成熟的圖像生成，視頻生成任務更複雜，在實際應用中要面臨著真實性、動作連貫性、畫面流暢性、細節精確度、場景、角色和光影一致性、物理準確性、時長限制等諸多挑戰。這些挑戰應對得好不好，將直接決定了模型的實用性和易用性。顯然，再度升級的可靈 AI 在這些方面有了脫胎換骨的變化。總結起來，
可靈 AI 擁有七大能力亮點
。快手視覺生成與互動中心負責人
萬鵬飛
對這些能力一一展開了剖析，這些構築起可靈AI 在視頻畫質、圖生視頻、運動生成、生成時長、物理規律、指令響應、視訊可控性等方面的核心競爭力，並造就瞭如今全能的可靈AI。同時，萬鵬飛也對未來發展做出了展望，他表示，影片產生效果的提升速度非常快，正在逐步接近圖形渲染和相機拍攝，將會對泛影產業帶來新的機會。

萬鑩、對上的時候展示過腳版發揮了能力和影響力的高度展示？相機鏡頭控制新功能正是可靈AI 在
電影級高清畫面生成、領先圖生視訊效果及優秀視訊生成可控性
三大能力的進一步演化。其中
電影級的高清畫面生成能
力能夠高保真、生動地呈現壯闊的自然風光、人或動物的動作和表情等宏大或細微的場景，大片感十足。

領先的圖生視訊能力可以讓靜態影像動起來，轉換為生動的 5 秒短影片。同時搭配不同的文字輸入，讓圖生影片更有創意且「隨心所欲」。

例如將小狗游泳的圖像轉換為影片：

效果是這樣的：

優秀的視頻生成可控性

優秀的視頻生成可控性
讓更精細的視頻創作在用戶手中。除了這次的相機鏡頭控制之外，可靈 AI 未來還將在語音面部匹配、人物 ID 保持、透過簡單筆畫提示控制畫面和佈局的演進等更多方面實現可控調整。目前模型的訓練已經完成，這些功能很快就會上線。

同時，可靈 AI 在運動生成、生成時長、物理規律、指令響應等其他四大能力上也進一步升級。
其一
可靈 AI 具有大幅度且合理的運動生成能力。透過建模複雜的時空運動，可靈 AI 可以產生較大幅度的運動，並符合運動規律。

此次得益於更充分的模型訓練，可靈 AI 生成的整體運動效果更加靈動，支持更大動作範圍的同時合理性也沒有削弱。如下小貓的轉身、走路姿勢等都刻畫地非常自然合理，符合物理事實。

其二是分鐘級的長視頻生成能力
。現在，分鐘級時長已成為評估一個影片生成模型的重要指標，這要求更有效的多鏡頭處理、更長的故事講述以及更連貫一致的運動擴展能力。
目前，可靈 AI 能夠產生數分鐘的 1080p、30fps 影片。同時開放了遵循使用者指令的視訊續寫功能，單次續寫讓視訊運動延遲4 到5 秒，也支援連續多次續寫，最長可以產生3 分鐘的視訊
，並且在續寫時能夠指定故事後續發展方向，易用性拉滿。
此次升級後，可靈AI 在演算法和工程層面進行了聯合深度優化，使得
單次生成的視頻長度從5 秒提升到了10 秒
，在對用戶開放使用的產品中實現最長時長，可以呈現更完整的故事線，為使用者提供了更廣闊的創作空間。

其三可靈 AI 能夠
模擬複雜的物理世界特性。自 Sora 以來，各家視頻生成模型都非常注重生成符合物理規律的視頻，這決定了模型能力的上限。

可靈 AI 在發布之初就能夠準確地建模和模擬現實世界的屬性，讓生成的影片接近真實，例如給小貓洗澡。

現在，在更充分模型訓練的加持下，可靈 AI 對互動式物理規律的建模和模擬能力又上了一個台階。
其四可靈 AI 的
概念組合和指令響應能力非常強
。在技術實現上，透過對文字到影片跨模態語意的深刻理解，可靈 AI 能夠將使用者豐富的想像力輕鬆轉換為具體的影片畫面，放飛腦洞，例如咖啡杯火山。

🎜升級後的可靈 AI 接受了效果更優的文本資料和編碼方案，自然而然對用戶提示詞的響應能力得到增強，視覺渲染效果更好了。 🎜🎜🎜🎜
All these capabilities are derived from Keling AI’s video generation technology route (using DiT architecture), model design (such as latent space encoding and decoding, temporal information modeling, text expansion and encoding), data assurance (such as multi-dimensional tag system, Technology accumulation and unique innovations in aspects such as video description model), computing efficiency (such as distributed training cluster, staged training strategy), and capability expansion (such as video timing extension, multi-modal input controllable).

It can be said that today’s Keling AI is technically advanced and reliable. No wonder the technology has been sought after by people as soon as it was launched.

In the era of generative AI, Kuaishou comes prepared

In the past year or so, the entire field of large models can be said to be very busy. Last year we were talking about the development of base models, and this year everyone is talking about applications. With the opening of the WAIC conference in recent days, we have witnessed another wave of debate between the “model school” and the “application school”.

In this wave, what is Kuaishou doing?

First of all, it plays with the system. From the underlying IDC computing center to the network architecture and AI platform, to the basic core large model in the middle layer, to various application exploration at the application layer, Kuaishou has implemented a complete set of self-research and development. When talking about this system, Zhang Di, the vice president of Kuaishou and head of the large model team, believes that a firm investment in independent research and development will bring about a "technical snowball" effect and huge cost advantages in the long run. A very big advantage of Kuaishou is that it has a large number of AI application scenarios in the upper layer, which will bring many opportunities for the implementation of large models.

^{. The basic model determines the upper limit of AI capabilities. Quantitative changes in research investment can lead to qualitative changes. On the other hand, commercial application can snowball technology. New technologies can be put into application in stages and feedback can be continuously harvested to gradually form a virtuous cycle.}Starting last year, Kuaishou proposed the "KwaiYi" large model, which quickly grew from the early 13B parameter size to 175B, and launched a multi-modal version. After multiple versions of iterations, Ruiyi's large model has begun to play a role in Kuaishou's internal material creation, AI interaction, content production and other scenarios. In June this year, the daily consumption of Ruiyi-based AIGC marketing materials exceeded 20 million.

With the basic model, Kuaishou has gradually developed its own differentiated capabilities in more scenarios.

Specifically, on Wenshengtu, Kuaishou’s “Ketu” has become one of the top models in the industry, with strong semantic understanding and command following capabilities. Thanks to innovations in text representation and a lot of work on image data alignment, Ketu can draw camera-level picture textures. After reinforcement learning training, the aesthetics has also been aligned with universal human standards.
In terms of video generation, “Keling AI” has ignited a new round of competition in the global video generation field. It can produce text-based videos and picture-based videos, and has rich image editing capabilities. It remains excellent in the industry in terms of controllability, texture, beauty, and motion rationality of video generation. Kuaishou engineers are continuing to optimize engineering algorithms and strive to continuously lower the threshold for video-generated AI.
Speaking of setting the bar, the optimization of new technologies is one of the important challenges currently facing generative AI. As a national-level short video application, Kuaishou's advantage is that it has a large number of AI application scenarios, which brings scenarios and opportunities for implementation.
In the implementation of the technology, Kuaishou has achieved a series of milestones:

Kuaishou’s conversation model application "
AI Xiaokuai
" in the APP comment area, which can understand the content in the video and interact with you, has been tested so far It has accumulated more than 10 million fans.

In the e-commerce live broadcast room, using the ability of Wenshengtu AI "
can be pictured
", veterans can also use their own life photos to quickly try on clothes, and even see dynamic displays.
The video generation model "
Keling AI
" has been widely recognized by users since its release. It has generated a total of 7 million videos and opened a one-stop content creation platform.
From content production, understanding to recommendation and other levels, from individuals to e-commerce, Kuaishou’s generative AI capabilities have achieved full coverage of the main business, and continue to promote the continuous development of the Kuaishou ecosystem.

Finally there is a new attempt. On WAIC, Kuaishou announced that the first AIGC short drama "Mountains and Seas Strange Mirror: Cutting Waves" will be officially released this month.

The play is provided with in-depth technical support by Keling AI, using a cyber style to reproduce the ancient mythical world in the Classic of Mountains and Seas. Judging from the trailer, scenes from mountains to oceans, forests to the sky all present stunning visual effects. In the past, such effects might have required a professional special effects team, but now, visual generation AI can bring a stunning visual experience.

Yes, half a year ago we were still imagining the future, but now AI has really started to make movies.

In the current wave of large models, nothing can best prove technical capabilities than large-scale implementation.

And Kuaishou’s all-round practice has once again confirmed that the productivity of AI has unconsciously changed our lives.

以上是70萬人爭先體驗！影片生成新王者「可靈AI」又雙詠升級了的詳細內容。更多資訊請關注PHP中文網其他相關文章！

架构分布式循环算法人工智能 https AIGC zoom

陳述：

本文內容由網友自願投稿，版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容，請聯絡admin@php.cn

上一篇：效率高，無標籤，Google團隊用AI挖掘臨床數據，改善基因發現與疾病預測，登Nature子刊下一篇：效率高，無標籤，Google團隊用AI挖掘臨床數據，改善基因發現與疾病預測，登Nature子刊

看更多