Let's talk about the recurrence of Sora: the one who is looked up to and the one who is forgotten-AI-php.cn

Home

Technology peripherals

Let's talk about the recurrence of Sora: the one who is looked up to and the one who is forgotten

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Mar 27, 2024 pm 07:21 PM

openaiindustryvideo generationsora

On February 16, OpenAI released Sora, a blockbuster model in the field of video generation.

Sora’s belief in Scaling Law and its groundbreaking technological innovations has kept it at the forefront. At the same time, it also proves once again that "vigor can produce miracles" is still applicable to the field of Vincentian video.

#The technical details disclosed by Sora are far from enough to get a full picture. At the same time, Sora is not officially open to the public yet. Since then, thoughts and discussions about Sora have never stopped. 再谈复现 Sora：被仰望与被遗忘的

Sora brought the biggest impact on the entire AI field. Some video generation ideas and frameworks. This also triggered a craze for recreating Sora that continues to this day.

The motivation to reproduce Sora comes from the technical persistence and technical ideals of technicians on the one hand, and the foreseeable business value in the future on the other.

In addition, what cannot be ignored is that this artificial intelligence technology research institution, which continues to be nicknamed CloseAI, has become a benchmark in the industry, with almost every product released. can bring about disruptive innovation. But OpenAI seems to be going further and further on the road of insisting on closed source, which has further ignited the public's passion for reproducing Sora. We can believe that in the next few months, multiple Sora-like models will be released one after another and will be open sourced.

In the more than a month since Sora was released, what is the progress of the discussion and reproduction of its related technological innovations? Let’s take a look below.

Regarding the reproduction of Sora, this article starts from the following three aspects:

## It’s been more than a month since Sora was released. What is the current progress of the reproduction?
How likely is it to happen again? What is the technical foundation in the country?
Is Sora a world model? Can you help us get to AGI? Is it necessary to reproduce it?

Sora-like model

Three models that have been launched and discussed more They are Snap Video, Open-Sora 1.0, and Mora.

##Snap Video

Snap Video is a Sora-like model released on February 29. It uses an extensible spatio-temporal Transformer from Snap, the company that developed the SnapChat picture sharing software, as well as institutions such as the University of Trento.

Portal:

《The first batch of Sora-like models appeared, Sarabu launched Snap Video, the effect is better than Pika, not inferior to Gen-2

Open-Sora 1.0

Portal:

"Don't wait for OpenAI, wait for Open-Sora to be fully open source"

Mora is a multi-agent framework proposed a few days ago by researchers from Lehigh University and Microsoft Research. The framework integrates several advanced visual AI agents to replicate what Sora has demonstrated. General video generation capabilities.

Portal: "Re-engraving Sora's universal video generation capabilities, the open source multi-agent framework Mora is here"

Although The current model reproduction effect is still not as good as Sora, but in just over a month, there have been obvious technological breakthroughs, which is an optimistic signal. According to incomplete statistics, nearly 10 domestic teams are reproducing Sora, let us wait and see.

Technical architecture innovation before DiT

The DiT (Diffusion Transformer) used by Sora ) architecture is currently its biggest technological innovation, but looking back, perhaps the domestic progress is earlier.

U-ViT Architecture

## U-ViT Architecture

In September 2022, the Tsinghua team submitted a paper titled "All are Worth Words: A ViT Backbone for Diffusion Models", which was earlier than DiT 2 months. This paper proposes to use the Transformer-based network architecture U-ViT to replace the CNN-based U-Net, which coincides with Sora's idea of integrating the Transformer and diffusion models.

Portal:

"Are domestic companies expected to make Sora?" This large model team from Tsinghua University gives hope》

Video Diffusion Transformer (VDT), which was released on the arXiv website in May 2023, is led by the research team of Renmin University of China and cooperates with the University of California, Berkeley, and the University of Hong Kong. It is a Transformer-based Video Unification Generate framework. A detailed explanation of the reasons for adopting the Transformer architecture is also given.

Portal:

"Domestic universities build Sora-like model VDT, universal video diffusion Transformer is accepted by ICLR 2024"

Maybe in In terms of innovation of core technologies, domestic exploration does not lag behind, but leads the way. However, due to resource constraints and technical road planning and other reasons, it has not been able to achieve effects similar to Sora before.

Sora has undoubtedly verified a technically feasible path, and our own leading exploration in technical architecture will be more conducive to us reproducing Sora, and even I am more optimistic about the effect of surpassing Sora in some areas.

Is Sora a world model?

#Another hot discussion triggered by Sora is about the world model.

The videos generated by Sora undoubtedly have a certain understanding of the physical world, such as the classic "Pirate ship entangled in a coffee cup", which can be seen with the naked eye and involves professional fluid dynamics. , light and other characteristics of the physical world.

But some scientists, represented by Yann LeCun, strongly prove that Sora’s training method has nothing to do with the world model.

So is Sora a world model? Does he understand the physical world? Discussions about this have spread to various forums and live broadcasts. It can be seen that everyone has different opinions on the topic of what a world model is.

What we can be clear about is that if Sora is a world model, then the ideal of general artificial intelligence (AGI) may arrive sooner than we expect. Then it is necessary to reproduce Sora.

We remain curious about Sora and continue to explore possible answers to the following questions.

Can Sora’s previous video generation architecture/technology still be used? How to use?
Who is forgotten after Sora? Who is looked up to?
How do other startups/teams outside of Sora do this? do what?
Will Sora change the mainstream technology architecture? Will the architecture represented by DiT be the mainstream architecture choice in the future?
Should domestic technological power reproduce Sora? Why?
It is known that nearly 10 teams are reproducing Sora. What is the future pattern we may see?
Why OpenAI? Can OpenAI’s model be replicated?
What is the global video generation landscape like after Sora? How will it develop and change?
How do you think some star startups have publicly stated that they will not do Sora?
What is the future of multi-modal large models?
How do you view Sora’s impact from different perspectives? (Perspectives of investors, non-technical people, state-owned enterprises, AI entrepreneurs, practitioners, etc.)
What kind of social role does OpenAI play? What do you think of this company?
……

The impact brought by Sora is subversive, so the solution to the above problems will continue. As a team focused on the exploration and application practice of cutting-edge AI technologies, our AI technology forum once again focuses on the field of video generation.

On April 13th, at Liudaokou, Beijing, we planned a technical forum to focus on technological innovation, thinking and application practice after the release of Sora. The event will bring together many important guests, and we will also discuss the issues mentioned above in more depth.

In the foreseeable future, I believe that this event can have a certain positive effect and inspiration, with a view to promoting the technological development and dissemination of my country's AI open source community.

Guest lineup

This forum has a strong guest lineup, we have invited:

Mr. Zhang Junlin, a well-known technical expert in the industry, will give an in-depth dismantling of Sora’s core technology
The popular video generation model PixelDance The author, teacher Zeng Yan from ByteDance, shares the technological innovation and application behind PixelDance
The team leader of the Sora-like model VDT, from a startup company incubated by Renmin University of China—— Dr. Gao Yizhao, CEO of Sophon Engine, breaks down the technological innovation and practice of VDT in detail
Investors are an important role that cannot be separated from the AI field. Teacher Chen Shi, as the head of Fengrui Capital Investment partners will bring unique observations from the perspective of investors/institutions
State-owned enterprises responded quickly after the release of Sora and occupied a place in the AI field. From China Mobile Information Technology Co., Ltd. Mr. Tong Tong, the head of algorithm technology, will share his new thinking
The technical head of the Sora-like model Open-Sora 1.0, Mr. Bian Zhengda, CTO from Luchen Technology, is also Will break down in detail how to reproduce Sora, as well as the unique thinking and practice from their team
There are more important guests, and we are inviting them one after another...

Zhang Junlin

Director of the Chinese Information Society of China, Ph.D. of the Institute of Software, Chinese Academy of Sciences

Currently serves as the head of new technology research and development for Sina Weibo. Previously, he served as a senior technical expert at Alibaba and was responsible for the new technology team. Author of technical books "This is Search Engine: Detailed Explanation of Core Technology" and "Big Data Daily Record: Architecture and Algorithms".

Zeng Yan

ByteDance Research Algorithm Engineer

Focus on cutting-edge research in areas such as video generation and multi-modal pre-training. The model he leads in research and development has provided powerful services for ByteDance’s video generation, short video review, e-commerce customer service, Toutiao, educational problem solving and other businesses, and he has published eight related papers as the first author in TPAMI, ICML , CVPR, ACL and other top international conferences and journals, and also serves as a reviewer for TPAMI, ICML, NIPS, ICLR and other conferences. The PixelDance video generation basic model led by the company achieved the combination of high dynamics and stability for the first time in the industry, and generated a 3-minute continuous plot animation for the first time.

陈石

Fengrui Capital InvestmentPartner

##Focus on technology, software, and the Internet investment in , consumption and other fields. Before joining Fengrui Capital, he had 5 years of management experience in Alibaba. He served as vice president of Alibaba Mobile Business Group, senior executive of Alibaba Culture and Entertainment Group, international class committee member of Youku and UC, and was deeply involved in UC, AutoNavi, Youku, and Tudou. , Shenma Search, UC International and other product lines business decision-making and management execution.

15 years of continuous entrepreneurship, as a member of the core management team, deeply involved in UC (the world's largest third-party mobile browser, acquired by Alibaba in 2014) and Lakala (a well-known Chinese company During the entrepreneurial process of a third-party payment company (SZ: 300773), he served as vice president and CTO respectively; he was once a happy programmer, user growth expert, and technology enthusiast.

# holds bachelor’s and master’s degrees in Mechanical and Electrical Engineering from Beijing University of Aeronautics and Astronautics. In 2023, he was named EqualOcean's "Top 30 Global Global Investors in 2023" and Jiazi Guangnian's "Top 20 Best Investors in Artificial Intelligence and Big Data in 2022-2023".

Gao Yizhao

Sophon Engine CEO

##Ph.D. from Hillhouse School of Artificial Intelligence, Renmin University of China. An expert in multi-modal large models, he has published many top journals and conference papers, and has led a multi-person team to complete Wenlan large model training. Participate in the development and promotion of Sophon engine related models and products throughout the process.

Bian Zhengda

CTO of Luchen Technology

Graduated from the National University of Singapore. He published a paper at SC, the world's top supercomputing conference. He has 7 years of experience in high-performance AI systems and is the core developer of the Colossal-AI system.

Tong Tong

Head of Algorithm Technology of China Mobile Information Technology Co., Ltd.

Ph.D. in AI from the Institute of Automation, Chinese Academy of Sciences. Currently, he is responsible for the research and development of multi-modal large models, digital humans, intelligent agents and other fields at China Mobile Information Technology Co., Ltd., and has realized the implementation of key technologies such as Vincent pictures, Vincent videos, large model action recognition and target detection. Published a total of 12 papers, 12 company patents, and 4 soft publications.

More experts are being confirmed, so stay tuned.

Video generation technology and application - Sora era

This site’s AI technology forum always maintains sensitive tracking of technological breakthroughs in the AI field. , in order to deeply explore Sora's impact on technology and its impact on all walks of life, we specially planned the "Video Generation Technology and Application - Sora Era" AI technology forum.

We hope to help enterprises and practitioners keep up with the trend of technological development and have a comprehensive understanding of technological breakthroughs and application practices in cutting-edge fields such as Sora, video generation technology, and multi-modal large models. .

Faced with the onslaught of AI video generation, only by actively embracing learning and daring to try can we seize the technological trend and break through.

Looking forward to meeting you in Haidian District, Beijing on April 13, 2024.

The registration channel for the forum is officially opened. Scan the QR code on the poster to go directly to the event page. Due to the late release of guest introductions, the early bird discount period for this forum has been extended.

From now until 23:55 on April 7th, you can purchase tickets to participate in the conference Get a direct discount of 200 yuan and enjoy a special early bird ticket price of 699 yuan (original price 899 yuan). There are even more exclusive discounts for group purchases of five people, please see the event details page for details.

Past participants of this site’s AI technology forum, please add Alice’s WeChat account separately to get direct access to the exclusive discount link.

Activity Highlights

Free permanent viewing of the video and courseware of the forum event "Video Generation Frontier Research and Application" (the previous event has been purchased Please contact Alice for deduction. After purchasing this issue, remember to find Alice to redeem the previous video)
Watch permanently the post-event video of this "Video Generation Technology and Application - Sora Era" forum event And courseware
Gathers university professors and heavyweight technical experts from the industry to master the latest technology and broaden technical horizons
Communicate face-to-face with technical experts , in-depth connection after the meeting
covering core technology dismantling, star product best practices, technology future discussions and prospects
Full process to assist learning : Gift pack of learning materials before and after the conference
Join the video generation high-quality technology exchange community and follow up on the industry’s cutting-edge technology and information in a timely manner
Enjoy a 15% discount on tickets for related paid activities under this site

Technical Exchange Community

In order to facilitate technical exchanges, we also specially A video generation technology exchange group has been established. Technical practitioners who care about Sora, video generation and multi-modal large models are welcome to scan the QR code to join the conversation and exchange technical details and industry observations in depth.

Regarding issues related to business cooperation, group purchasing, invoices, content and other related issues for this event, please add Alice, the person in charge of this event, or consult via email.

WeChat: 15650753618

Email: jiayaning@jiqizhixin.com

##About Invoice: After successful registration, you can apply for an invoice on the Activity Bank App after the event. The invoice is an electronic VAT invoice. After the invoice is successfully issued, it will be sent to the registration email address.

#Become a forum volunteer: Participate in the implementation of specific matters at the event site, such as sign-in, guidance, order management, etc. Work meals are included. Current students are given priority. If interested, please contact Alice.

The above is the detailed content of Let's talk about the recurrence of Sora: the one who is looked up to and the one who is forgotten. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:机器之心. If there is any infringement, please contact admin@php.cn delete

How to Build Your Personal AI Assistant with Huggingface SmolLMApr 18, 2025 am 11:52 AM

Harness the Power of On-Device AI: Building a Personal Chatbot CLI In the recent past, the concept of a personal AI assistant seemed like science fiction. Imagine Alex, a tech enthusiast, dreaming of a smart, local AI companion—one that doesn't rely

AI For Mental Health Gets Attentively Analyzed Via Exciting New Initiative At Stanford UniversityApr 18, 2025 am 11:49 AM

Their inaugural launch of AI4MH took place on April 15, 2025, and luminary Dr. Tom Insel, M.D., famed psychiatrist and neuroscientist, served as the kick-off speaker. Dr. Insel is renowned for his outstanding work in mental health research and techno

The 2025 WNBA Draft Class Enters A League Growing And Fighting Online HarassmentApr 18, 2025 am 11:44 AM

"We want to ensure that the WNBA remains a space where everyone, players, fans and corporate partners, feel safe, valued and empowered," Engelbert stated, addressing what has become one of women's sports' most damaging challenges. The anno

Comprehensive Guide to Python Built-in Data Structures - Analytics VidhyaApr 18, 2025 am 11:43 AM

Introduction Python excels as a programming language, particularly in data science and generative AI. Efficient data manipulation (storage, management, and access) is crucial when dealing with large datasets. We've previously covered numbers and st

First Impressions From OpenAI's New Models Compared To AlternativesApr 18, 2025 am 11:41 AM

Before diving in, an important caveat: AI performance is non-deterministic and highly use-case specific. In simpler terms, Your Mileage May Vary. Don't take this (or any other) article as the final word—instead, test these models on your own scenario

AI Portfolio | How to Build a Portfolio for an AI Career?Apr 18, 2025 am 11:40 AM

Building a Standout AI/ML Portfolio: A Guide for Beginners and Professionals Creating a compelling portfolio is crucial for securing roles in artificial intelligence (AI) and machine learning (ML). This guide provides advice for building a portfolio

What Agentic AI Could Mean For Security OperationsApr 18, 2025 am 11:36 AM

The result? Burnout, inefficiency, and a widening gap between detection and action. None of this should come as a shock to anyone who works in cybersecurity. The promise of agentic AI has emerged as a potential turning point, though. This new class

Google Versus OpenAI: The AI Fight For StudentsApr 18, 2025 am 11:31 AM

Immediate Impact versus Long-Term Partnership? Two weeks ago OpenAI stepped forward with a powerful short-term offer, granting U.S. and Canadian college students free access to ChatGPT Plus through the end of May 2025. This tool includes GPT‑4o, an a

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks agoByDDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

2 weeks agoByDDD

Where to find the Crane Control Keycard in Atomfall

3 weeks agoByDDD

Saving in R.E.P.O. Explained (And Save Files)

1 months agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows - How To Find The Blacksmith And Unlock Weapon And Armour Customisation

4 weeks agoByDDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.