On February 16, OpenAI released Sora, a blockbuster model in the field of video generation.
Sora’s belief in Scaling Law and its groundbreaking technological innovations has kept it at the forefront. At the same time, it also proves once again that "vigor can produce miracles" is still applicable to the field of Vincentian video. #The technical details disclosed by Sora are far from enough to get a full picture. At the same time, Sora is not officially open to the public yet. Since then, thoughts and discussions about Sora have never stopped. o
Sora brought the biggest impact on the entire AI field. Some video generation ideas and frameworks. This also triggered a craze for recreating Sora that continues to this day.
The motivation to reproduce Sora comes from the technical persistence and technical ideals of technicians on the one hand, and the foreseeable business value in the future on the other.
In addition, what cannot be ignored is that this artificial intelligence technology research institution, which continues to be nicknamed CloseAI, has become a benchmark in the industry, with almost every product released. can bring about disruptive innovation. But OpenAI seems to be going further and further on the road of insisting on closed source, which has further ignited the public's passion for reproducing Sora. We can believe that in the next few months, multiple Sora-like models will be released one after another and will be open sourced.
In the more than a month since Sora was released, what is the progress of the discussion and reproduction of its related technological innovations? Let’s take a look below.
Regarding the reproduction of Sora, this article starts from the following three aspects:
- ## It’s been more than a month since Sora was released. What is the current progress of the reproduction?
- How likely is it to happen again? What is the technical foundation in the country?
- Is Sora a world model? Can you help us get to AGI? Is it necessary to reproduce it?
Sora-like model
Three models that have been launched and discussed more They are Snap Video, Open-Sora 1.0, and Mora. Snap Video is a Sora-like model released on February 29. It uses an extensible spatio-temporal Transformer from Snap, the company that developed the SnapChat picture sharing software, as well as institutions such as the University of Trento.
Portal:
《The first batch of Sora-like models appeared, Sarabu launched Snap Video, the effect is better than Pika, not inferior to Gen-2
- Open-Sora 1.0
##Open-Sora 1.0 is the first class to be fully open sourced on March 18 Sora model, from the Colossal-AI team, this open source model covers the entire training process, including data processing, all training details and model weights.
"Don't wait for OpenAI, wait for Open-Sora to be fully open source"
Mora is a multi-agent framework proposed a few days ago by researchers from Lehigh University and Microsoft Research. The framework integrates several advanced visual AI agents to replicate what Sora has demonstrated. General video generation capabilities. Although The current model reproduction effect is still not as good as Sora, but in just over a month, there have been obvious technological breakthroughs, which is an optimistic signal. According to incomplete statistics, nearly 10 domestic teams are reproducing Sora, let us wait and see. Technical architecture innovation before DiT
The DiT (Diffusion Transformer) used by Sora ) architecture is currently its biggest technological innovation, but looking back, perhaps the domestic progress is earlier.
## U-ViT Architecture In September 2022, the Tsinghua team submitted a paper titled "All are Worth Words: A ViT Backbone for Diffusion Models", which was earlier than DiT 2 months. This paper proposes to use the Transformer-based network architecture U-ViT to replace the CNN-based U-Net, which coincides with Sora's idea of integrating the Transformer and diffusion models.
Portal:
"Are domestic companies expected to make Sora?" This large model team from Tsinghua University gives hope》
Video Diffusion Transformer (VDT), which was released on the arXiv website in May 2023, is led by the research team of Renmin University of China and cooperates with the University of California, Berkeley, and the University of Hong Kong. It is a Transformer-based Video Unification Generate framework. A detailed explanation of the reasons for adopting the Transformer architecture is also given.
Portal:
"Domestic universities build Sora-like model VDT, universal video diffusion Transformer is accepted by ICLR 2024" Maybe in In terms of innovation of core technologies, domestic exploration does not lag behind, but leads the way. However, due to resource constraints and technical road planning and other reasons, it has not been able to achieve effects similar to Sora before.
Sora has undoubtedly verified a technically feasible path, and our own leading exploration in technical architecture will be more conducive to us reproducing Sora, and even I am more optimistic about the effect of surpassing Sora in some areas.
Is Sora a world model?
#Another hot discussion triggered by Sora is about the world model.
The videos generated by Sora undoubtedly have a certain understanding of the physical world, such as the classic "Pirate ship entangled in a coffee cup", which can be seen with the naked eye and involves professional fluid dynamics. , light and other characteristics of the physical world.
But some scientists, represented by Yann LeCun, strongly prove that Sora’s training method has nothing to do with the world model.
So is Sora a world model? Does he understand the physical world? Discussions about this have spread to various forums and live broadcasts. It can be seen that everyone has different opinions on the topic of what a world model is.
What we can be clear about is that if Sora is a world model, then the ideal of general artificial intelligence (AGI) may arrive sooner than we expect. Then it is necessary to reproduce Sora.
We remain curious about Sora and continue to explore possible answers to the following questions.
- Can Sora’s previous video generation architecture/technology still be used? How to use?
- Who is forgotten after Sora? Who is looked up to?
- How do other startups/teams outside of Sora do this? do what?
- Will Sora change the mainstream technology architecture? Will the architecture represented by DiT be the mainstream architecture choice in the future?
- Should domestic technological power reproduce Sora? Why?
It is known that nearly 10 teams are reproducing Sora. What is the future pattern we may see?
Why OpenAI? Can OpenAI’s model be replicated?
What is the global video generation landscape like after Sora? How will it develop and change?
How do you think some star startups have publicly stated that they will not do Sora?
What is the future of multi-modal large models?
How do you view Sora’s impact from different perspectives? (Perspectives of investors, non-technical people, state-owned enterprises, AI entrepreneurs, practitioners, etc.)
What kind of social role does OpenAI play? What do you think of this company?
……
The impact brought by Sora is subversive, so the solution to the above problems will continue. As a team focused on the exploration and application practice of cutting-edge AI technologies, our AI technology forum once again focuses on the field of video generation. On April 13th, at Liudaokou, Beijing, we planned a technical forum to focus on technological innovation, thinking and application practice after the release of Sora. The event will bring together many important guests, and we will also discuss the issues mentioned above in more depth. In the foreseeable future, I believe that this event can have a certain positive effect and inspiration, with a view to promoting the technological development and dissemination of my country's AI open source community. Guest lineup
This forum has a strong guest lineup, we have invited:
- Mr. Zhang Junlin, a well-known technical expert in the industry, will give an in-depth dismantling of Sora’s core technology
- The popular video generation model PixelDance The author, teacher Zeng Yan from ByteDance, shares the technological innovation and application behind PixelDance
- The team leader of the Sora-like model VDT, from a startup company incubated by Renmin University of China—— Dr. Gao Yizhao, CEO of Sophon Engine, breaks down the technological innovation and practice of VDT in detail
- Investors are an important role that cannot be separated from the AI field. Teacher Chen Shi, as the head of Fengrui Capital Investment partners will bring unique observations from the perspective of investors/institutions
- State-owned enterprises responded quickly after the release of Sora and occupied a place in the AI field. From China Mobile Information Technology Co., Ltd. Mr. Tong Tong, the head of algorithm technology, will share his new thinking
- The technical head of the Sora-like model Open-Sora 1.0, Mr. Bian Zhengda, CTO from Luchen Technology, is also Will break down in detail how to reproduce Sora, as well as the unique thinking and practice from their team
- There are more important guests, and we are inviting them one after another...
Zhang Junlin
Director of the Chinese Information Society of China, Ph.D. of the Institute of Software, Chinese Academy of SciencesCurrently serves as the head of new technology research and development for Sina Weibo. Previously, he served as a senior technical expert at Alibaba and was responsible for the new technology team. Author of technical books "This is Search Engine: Detailed Explanation of Core Technology" and "Big Data Daily Record: Architecture and Algorithms". Zeng Yan
ByteDance Research Algorithm Engineer Focus on cutting-edge research in areas such as video generation and multi-modal pre-training. The model he leads in research and development has provided powerful services for ByteDance’s video generation, short video review, e-commerce customer service, Toutiao, educational problem solving and other businesses, and he has published eight related papers as the first author in TPAMI, ICML , CVPR, ACL and other top international conferences and journals, and also serves as a reviewer for TPAMI, ICML, NIPS, ICLR and other conferences. The PixelDance video generation basic model led by the company achieved the combination of high dynamics and stability for the first time in the industry, and generated a 3-minute continuous plot animation for the first time. 陈石
Fengrui Capital InvestmentPartner##Focus on technology, software, and the Internet investment in , consumption and other fields. Before joining Fengrui Capital, he had 5 years of management experience in Alibaba. He served as vice president of Alibaba Mobile Business Group, senior executive of Alibaba Culture and Entertainment Group, international class committee member of Youku and UC, and was deeply involved in UC, AutoNavi, Youku, and Tudou. , Shenma Search, UC International and other product lines business decision-making and management execution.
15 years of continuous entrepreneurship, as a member of the core management team, deeply involved in UC (the world's largest third-party mobile browser, acquired by Alibaba in 2014) and Lakala (a well-known Chinese company During the entrepreneurial process of a third-party payment company (SZ: 300773), he served as vice president and CTO respectively; he was once a happy programmer, user growth expert, and technology enthusiast.
# holds bachelor’s and master’s degrees in Mechanical and Electrical Engineering from Beijing University of Aeronautics and Astronautics. In 2023, he was named EqualOcean's "Top 30 Global Global Investors in 2023" and Jiazi Guangnian's "Top 20 Best Investors in Artificial Intelligence and Big Data in 2022-2023".
Gao Yizhao
##Ph.D. from Hillhouse School of Artificial Intelligence, Renmin University of China. An expert in multi-modal large models, he has published many top journals and conference papers, and has led a multi-person team to complete Wenlan large model training. Participate in the development and promotion of Sophon engine related models and products throughout the process. Bian Zhengda
CTO of Luchen Technology
Graduated from the National University of Singapore. He published a paper at SC, the world's top supercomputing conference. He has 7 years of experience in high-performance AI systems and is the core developer of the Colossal-AI system. Tong Tong
Head of Algorithm Technology of China Mobile Information Technology Co., Ltd.
Ph.D. in AI from the Institute of Automation, Chinese Academy of Sciences. Currently, he is responsible for the research and development of multi-modal large models, digital humans, intelligent agents and other fields at China Mobile Information Technology Co., Ltd., and has realized the implementation of key technologies such as Vincent pictures, Vincent videos, large model action recognition and target detection. Published a total of 12 papers, 12 company patents, and 4 soft publications.
More experts are being confirmed, so stay tuned. Video generation technology and application - Sora era
This site’s AI technology forum always maintains sensitive tracking of technological breakthroughs in the AI field. , in order to deeply explore Sora's impact on technology and its impact on all walks of life, we specially planned the "Video Generation Technology and Application - Sora Era" AI technology forum.
We hope to help enterprises and practitioners keep up with the trend of technological development and have a comprehensive understanding of technological breakthroughs and application practices in cutting-edge fields such as Sora, video generation technology, and multi-modal large models. .
Faced with the onslaught of AI video generation, only by actively embracing learning and daring to try can we seize the technological trend and break through.
Looking forward to meeting you in Haidian District, Beijing on April 13, 2024. The registration channel for the forum is officially opened. Scan the QR code on the poster to go directly to the event page. Due to the late release of guest introductions, the early bird discount period for this forum has been extended. From now until 23:55 on April 7th, you can purchase tickets to participate in the conference Get a direct discount of 200 yuan and enjoy a special early bird ticket price of 699 yuan (original price 899 yuan). There are even more exclusive discounts for group purchases of five people, please see the event details page for details. Past participants of this site’s AI technology forum, please add Alice’s WeChat account separately to get direct access to the exclusive discount link.
Activity Highlights
- Free permanent viewing of the video and courseware of the forum event "Video Generation Frontier Research and Application" (the previous event has been purchased Please contact Alice for deduction. After purchasing this issue, remember to find Alice to redeem the previous video)
- Watch permanently the post-event video of this "Video Generation Technology and Application - Sora Era" forum event And courseware
- Gathers university professors and heavyweight technical experts from the industry to master the latest technology and broaden technical horizons
- Communicate face-to-face with technical experts , in-depth connection after the meeting
- covering core technology dismantling, star product best practices, technology future discussions and prospects
- Full process to assist learning : Gift pack of learning materials before and after the conference
- Join the video generation high-quality technology exchange community and follow up on the industry’s cutting-edge technology and information in a timely manner
- Enjoy a 15% discount on tickets for related paid activities under this site
Technical Exchange Community
In order to facilitate technical exchanges, we also specially A video generation technology exchange group has been established. Technical practitioners who care about Sora, video generation and multi-modal large models are welcome to scan the QR code to join the conversation and exchange technical details and industry observations in depth. Regarding issues related to business cooperation, group purchasing, invoices, content and other related issues for this event, please add Alice, the person in charge of this event, or consult via email. Email: jiayaning@jiqizhixin.com##About Invoice: After successful registration, you can apply for an invoice on the Activity Bank App after the event. The invoice is an electronic VAT invoice. After the invoice is successfully issued, it will be sent to the registration email address. #Become a forum volunteer: Participate in the implementation of specific matters at the event site, such as sign-in, guidance, order management, etc. Work meals are included. Current students are given priority. If interested, please contact Alice. The above is the detailed content of Let’s talk about the recurrence of Sora: the one who is looked up to and the one who is forgotten. For more information, please follow other related articles on the PHP Chinese website!