


Currently, the amazing innovative capabilities of large-scale models continue to impact the creative field, especially representatives of video generation technology like Sora. Although Sora has led a new generation of trends, it may be worth paying attention to Apple's latest research results now.
Apple researchers recently released a framework called "Keyframer" that can use large language models to generate animations. This framework allows users to easily create animations for static 2D images through natural language prompts. This research demonstrates the potential of language models in designing animations, providing animation designers with more efficient and intuitive tools.
Paper address: https://arxiv.org/pdf/2402.06071.pdf
Specific Specifically, this research combines emerging design principles based on language prompt design artifacts and the code generation capabilities of LLM to build a new AI-driven animation tool Keyframer. Keyframer allows users to create animated illustrations from static 2D images through natural language prompts. With GPT-4, Keyframer can generate CSS animation code to animate the input SVG (Scalable Vector Graphic).
In addition, Keyframer supports users to directly edit the generated animation through multiple editor types.
Users can continuously improve their designs using the design variants generated by LLM through repeated prompts and requests, thereby thinking in new design directions. However, Keyframer has not yet been made public.
In doing this research, Apple stated that the application of LLM in animation has not been fully explored and brings new challenges, such as how users can effectively describe motion in natural language. . While Vincentian graphics tools such as Dall・E and Midjourney are currently great, animation design requires more complex considerations, such as timing and coordination, that are difficult to fully summarize in a single prompt.
Users simply upload an image, enter something like "let the stars twinkle" into the prompt box, and click Generate to see the effects of this study.
Users can generate multiple animation designs in a batch and adjust properties such as color code and animation duration in separate windows. No coding experience is required as Keyframer automatically converts these changes to CSS, and the code itself is fully editable. This description-based approach is much simpler than other forms of AI-generated animation, which often require several different applications and some coding experience.
Introduction to Keyframer
Keyframer is an LLM-powered application designed to create animations from static images. Keyframer leverages the code generation capabilities of LLM and the semantic structure of static vector graphics (SVG) to generate animations based on natural language cues provided by the user.
Input: The system provides an input area where users can paste what they want SVG image code to animate (SVG is a standard and popular image format commonly used in illustrations for its scalability and compatibility on multiple platforms). In Keyframer, a rendering of the SVG is displayed next to the code editor so that the user can preview the visual design of the image. As shown in Figure 2, the SVG code for the Saturn illustration contains identifiers such as sky, rings, etc.
GPT Prompts: This system allows users to enter natural language prompts to create animations. Users can request a single design (make the planet rotate) or multiple design variations (create a design with 3 twinkling stars), and then click the Generate Animation button to start the request. Before passing user requests to GPT, the study refines its prompts with the complete raw SVG XML and specifies the format of the LLM response.
GPT Output: Once the prompt request starts, GPT transmits the response, which consists of one or more CSS fragments, as shown in Figure 3.
Rendering: The rendering part includes (1) each animation is visually rendered and rendered by LLM Generated 1-sentence explanation (2) A series of editors for modifying designs.
The code editor is implemented using CodeMirror; the property editor provides property-specific UI for editing code, for example, to edit color, the study provides a color picker. Figure 5 shows the code editor and property editor icons.
Iteration: To support users to go deeper in the animation creation process (DG1) Exploration, the study also provides a feature that allows users to iteratively build on the generated animation using prompts. There is a button "Add New Prompt" below each generated design; clicking this button opens a new form at the bottom of the page for the user to extend their design with new prompts.
Save the designed sidebar and summary . The system allows users to star designs and add them to the sidebar, as shown on the right side of Figure 6. In addition, the system has a summary mode that hides all text editors and displays animations and their prompts, allowing users to quickly revisit previous prompts and designs.
During the experiment, the Apple team selected 13 participants (6 women, 7 men) to try out Keyframer. Table 1 provides some information about the participants and the skills they mastered.
Even professional motion designer "EP13" sees the potential of Keyframer to expand its capabilities: "I'm a little worried that these tools will replace our work because its potential is so great. But if you think about it carefully, this research will only improve our skills. It should be something to be happy about."
Overall, participation users are satisfied with their Keyframer experience. Participants gave an average score of 3.9, ranging between satisfied (4) and neutral (3). Participants generated 223 designs. On average, each participant generated 17.2 designs. Figure 8 shows an example of the final animation for two participants.
Please refer to the original paper for more technical details.
The above is the detailed content of Just one sentence to make the picture move. Apple uses large model animation to generate, and the result can be edited directly.. For more information, please follow other related articles on the PHP Chinese website!

LangChain is a powerful toolkit for building sophisticated AI applications. Its agent architecture is particularly noteworthy, allowing developers to create intelligent systems capable of independent reasoning, decision-making, and action. This expl

Radial Basis Function Neural Networks (RBFNNs): A Comprehensive Guide Radial Basis Function Neural Networks (RBFNNs) are a powerful type of neural network architecture that leverages radial basis functions for activation. Their unique structure make

Brain-computer interfaces (BCIs) directly link the brain to external devices, translating brain impulses into actions without physical movement. This technology utilizes implanted sensors to capture brain signals, converting them into digital comman

This "Leading with Data" episode features Ines Montani, co-founder and CEO of Explosion AI, and co-developer of spaCy and Prodigy. Ines offers expert insights into the evolution of these tools, Explosion's unique business model, and the tr

This article explores Retrieval Augmented Generation (RAG) systems and how AI agents can enhance their capabilities. Traditional RAG systems, while useful for leveraging custom enterprise data, suffer from limitations such as a lack of real-time dat

SQL Integrity Constraints: Ensuring Database Accuracy and Consistency Imagine you're a city planner, responsible for ensuring every building adheres to regulations. In the world of databases, these regulations are known as integrity constraints. Jus

PySpark, the Python API for Apache Spark, empowers Python developers to harness Spark's distributed processing power for big data tasks. It leverages Spark's core strengths, including in-memory computation and machine learning capabilities, offering

Harnessing the Power of Self-Consistency in Prompt Engineering: A Comprehensive Guide Have you ever wondered how to effectively communicate with today's advanced AI models? As Large Language Models (LLMs) like Claude, GPT-3, and GPT-4 become increas


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

MinGW - Minimalist GNU for Windows
This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SublimeText3 English version
Recommended: Win version, supports code prompts!

SublimeText3 Chinese version
Chinese version, very easy to use

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software