Home > Article > Technology peripherals > Turn scripts into videos, artificial intelligence only takes one step
"Generative AI research advances creative expression by giving people the tools to create new content quickly and easily," Meta said in a blog post announcing the work. With a sentence or a few lines of text, Make-A-Video brings your imagination to life, creating one-of-a-kind videos full of vivid color and scenery."
Meta CEO Mark Zuckerberg called the work "amazing progress" on Facebook, adding: "Generating a video is much harder than generating a photo because in addition to generating every pixel correctly In addition, the system must predict how they will change over time."
The videos are no longer than 5 seconds and do not contain any audio, but contain a large number of prompts. The best way to judge a model's performance is to observe its output. However, no one is currently allowed to access the model. This means that these clips were probably carefully selected by the developers to showcase the system in the best light.
Again, while these videos are obviously computer-generated, the output of this artificial intelligence model will rapidly improve in the near future. In contrast, in just a few years, AI image generators have gone from creating incomprehensible edge-to-edge pictures to lifelike content. While video progress may be slow due to the near-infinite complexity of the topic, the value of seamless video generation will inspire many agencies and companies to devote significant resources to the project.
Like the text-to-image model, it is possible to have harmful applications.
In a blog post announcing Make-a-Video, Meta noted that the video generation tool could be invaluable "to creators and artists." But, like the text-to-image pattern, the outlook is fraught. The output of these tools may be used for disinformation and propaganda.Meta says it hopes to “deliver thoughtful thought into how to build such a generative AI system,” and has so far published only one paper on the Make-A-Video model. The company said it plans to release a demo version of the system, but did not say when or how access to the model would be restricted.
It is worth mentioning that Meta is not the only organization working on AI video generators. Earlier this year, a team of researchers from Tsinghua University and the Beijing Academy of Artificial Intelligence (BAAI) released their own text-to-video model, named CogVideo.
Meta researchers note in a paper describing the model that Make-A-Video is training on pairs of images and captions, as well as unlabeled video clips. The training content comes from two datasets (WebVid-10M and HD-VILA-100M), which together contain millions of videos spanning hundreds of thousands of hours of footage. This includes stock video clips created by sites like Shutterstock and scraped from the web.
In addition to blurry footage and choppy animations, the model has a number of technical limitations, the researchers noted in their paper. For example, their training methods cannot learn information that might only be inferred by humans watching the videos—for example, whether a video of a wave of hands goes from left to right or from right to left. Other issues include generating videos longer than 5 seconds, videos containing multiple scenes and events, and higher resolutions. Make-A-Video currently outputs 16 frames of video at a resolution of 64 * 64 pixels, and then uses a separate artificial intelligence model to increase its size to 768 * 768.
Meta’s team also points out that, like all AI models trained using data scraped from the web, Make-A-Video learns and potentially exaggerates social biases, including Harmful bias. In text-to-image models, these biases often reinforce social biases. For example, ask one to generate an image of a “terrorist,” which is likely to depict a person wearing a turban. However, without open access, it’s hard to say what biases Meta’s models learned.
Meta said the company is "openly sharing this generative AI research and results with the technology community to gain their feedback and will continue to use our responsible AI framework to Refining and evolving our approach to this emerging technology.”
As artificial intelligence generators in the fields of painting and video become more and more popular, I believe that artificial intelligence generation tools for other arts (such as music) will soon (maybe already) appear.
The above is the detailed content of Turn scripts into videos, artificial intelligence only takes one step. For more information, please follow other related articles on the PHP Chinese website!