search
HomeTechnology peripheralsAIJust one sentence to make the picture move. Apple uses large model animation to generate, and the result can be edited directly.

Currently, the amazing innovative capabilities of large-scale models continue to impact the creative field, especially representatives of video generation technology like Sora. Although Sora has led a new generation of trends, it may be worth paying attention to Apple's latest research results now.

Apple researchers recently released a framework called "Keyframer" that can use large language models to generate animations. This framework allows users to easily create animations for static 2D images through natural language prompts. This research demonstrates the potential of language models in designing animations, providing animation designers with more efficient and intuitive tools.

Just one sentence to make the picture move. Apple uses large model animation to generate, and the result can be edited directly.

Paper address: https://arxiv.org/pdf/2402.06071.pdf

Specific Specifically, this research combines emerging design principles based on language prompt design artifacts and the code generation capabilities of LLM to build a new AI-driven animation tool Keyframer. Keyframer allows users to create animated illustrations from static 2D images through natural language prompts. With GPT-4, Keyframer can generate CSS animation code to animate the input SVG (Scalable Vector Graphic).

In addition, Keyframer supports users to directly edit the generated animation through multiple editor types.

Users can continuously improve their designs using the design variants generated by LLM through repeated prompts and requests, thereby thinking in new design directions. However, Keyframer has not yet been made public.

In doing this research, Apple stated that the application of LLM in animation has not been fully explored and brings new challenges, such as how users can effectively describe motion in natural language. . While Vincentian graphics tools such as Dall・E and Midjourney are currently great, animation design requires more complex considerations, such as timing and coordination, that are difficult to fully summarize in a single prompt.

Users simply upload an image, enter something like "let the stars twinkle" into the prompt box, and click Generate to see the effects of this study.

Just one sentence to make the picture move. Apple uses large model animation to generate, and the result can be edited directly.

Users can generate multiple animation designs in a batch and adjust properties such as color code and animation duration in separate windows. No coding experience is required as Keyframer automatically converts these changes to CSS, and the code itself is fully editable. This description-based approach is much simpler than other forms of AI-generated animation, which often require several different applications and some coding experience.

Introduction to Keyframer

Keyframer is an LLM-powered application designed to create animations from static images. Keyframer leverages the code generation capabilities of LLM and the semantic structure of static vector graphics (SVG) to generate animations based on natural language cues provided by the user.

Just one sentence to make the picture move. Apple uses large model animation to generate, and the result can be edited directly.

Input: The system provides an input area where users can paste what they want SVG image code to animate (SVG is a standard and popular image format commonly used in illustrations for its scalability and compatibility on multiple platforms). In Keyframer, a rendering of the SVG is displayed next to the code editor so that the user can preview the visual design of the image. As shown in Figure 2, the SVG code for the Saturn illustration contains identifiers such as sky, rings, etc.

Just one sentence to make the picture move. Apple uses large model animation to generate, and the result can be edited directly.

GPT Prompts: This system allows users to enter natural language prompts to create animations. Users can request a single design (make the planet rotate) or multiple design variations (create a design with 3 twinkling stars), and then click the Generate Animation button to start the request. Before passing user requests to GPT, the study refines its prompts with the complete raw SVG XML and specifies the format of the LLM response.

GPT Output: Once the prompt request starts, GPT transmits the response, which consists of one or more CSS fragments, as shown in Figure 3.

Just one sentence to make the picture move. Apple uses large model animation to generate, and the result can be edited directly.

Rendering: The rendering part includes (1) each animation is visually rendered and rendered by LLM Generated 1-sentence explanation (2) A series of editors for modifying designs.

The code editor is implemented using CodeMirror; the property editor provides property-specific UI for editing code, for example, to edit color, the study provides a color picker. Figure 5 shows the code editor and property editor icons.

Just one sentence to make the picture move. Apple uses large model animation to generate, and the result can be edited directly.

Iteration: To support users to go deeper in the animation creation process (DG1) Exploration, the study also provides a feature that allows users to iteratively build on the generated animation using prompts. There is a button "Add New Prompt" below each generated design; clicking this button opens a new form at the bottom of the page for the user to extend their design with new prompts.

Save the designed sidebar and summary . The system allows users to star designs and add them to the sidebar, as shown on the right side of Figure 6. In addition, the system has a summary mode that hides all text editors and displays animations and their prompts, allowing users to quickly revisit previous prompts and designs.

Just one sentence to make the picture move. Apple uses large model animation to generate, and the result can be edited directly.

During the experiment, the Apple team selected 13 participants (6 women, 7 men) to try out Keyframer. Table 1 provides some information about the participants and the skills they mastered.

Even professional motion designer "EP13" sees the potential of Keyframer to expand its capabilities: "I'm a little worried that these tools will replace our work because its potential is so great. But if you think about it carefully, this research will only improve our skills. It should be something to be happy about."

Just one sentence to make the picture move. Apple uses large model animation to generate, and the result can be edited directly.

Overall, participation users are satisfied with their Keyframer experience. Participants gave an average score of 3.9, ranging between satisfied (4) and neutral (3). Participants generated 223 designs. On average, each participant generated 17.2 designs. Figure 8 shows an example of the final animation for two participants.

Just one sentence to make the picture move. Apple uses large model animation to generate, and the result can be edited directly.

Please refer to the original paper for more technical details.

The above is the detailed content of Just one sentence to make the picture move. Apple uses large model animation to generate, and the result can be edited directly.. For more information, please follow other related articles on the PHP Chinese website!

Statement
This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete
Understanding LangChain Agent FrameworkUnderstanding LangChain Agent FrameworkApr 21, 2025 am 11:25 AM

LangChain is a powerful toolkit for building sophisticated AI applications. Its agent architecture is particularly noteworthy, allowing developers to create intelligent systems capable of independent reasoning, decision-making, and action. This expl

What are the Radial Basis Functions Neural Networks?What are the Radial Basis Functions Neural Networks?Apr 21, 2025 am 11:13 AM

Radial Basis Function Neural Networks (RBFNNs): A Comprehensive Guide Radial Basis Function Neural Networks (RBFNNs) are a powerful type of neural network architecture that leverages radial basis functions for activation. Their unique structure make

The Meshing Of Minds And Machines Has ArrivedThe Meshing Of Minds And Machines Has ArrivedApr 21, 2025 am 11:11 AM

Brain-computer interfaces (BCIs) directly link the brain to external devices, translating brain impulses into actions without physical movement. This technology utilizes implanted sensors to capture brain signals, converting them into digital comman

Insights on spaCy, Prodigy and Generative AI from Ines MontaniInsights on spaCy, Prodigy and Generative AI from Ines MontaniApr 21, 2025 am 11:01 AM

This "Leading with Data" episode features Ines Montani, co-founder and CEO of Explosion AI, and co-developer of spaCy and Prodigy. Ines offers expert insights into the evolution of these tools, Explosion's unique business model, and the tr

A Guide to Building Agentic RAG Systems with LangGraphA Guide to Building Agentic RAG Systems with LangGraphApr 21, 2025 am 11:00 AM

This article explores Retrieval Augmented Generation (RAG) systems and how AI agents can enhance their capabilities. Traditional RAG systems, while useful for leveraging custom enterprise data, suffer from limitations such as a lack of real-time dat

What are Integrity Constraints in SQL? - Analytics VidhyaWhat are Integrity Constraints in SQL? - Analytics VidhyaApr 21, 2025 am 10:58 AM

SQL Integrity Constraints: Ensuring Database Accuracy and Consistency Imagine you're a city planner, responsible for ensuring every building adheres to regulations. In the world of databases, these regulations are known as integrity constraints. Jus

Top 30 PySpark Interview Questions and Answers (2025)Top 30 PySpark Interview Questions and Answers (2025)Apr 21, 2025 am 10:51 AM

PySpark, the Python API for Apache Spark, empowers Python developers to harness Spark's distributed processing power for big data tasks. It leverages Spark's core strengths, including in-memory computation and machine learning capabilities, offering

Self-Consistency in Prompt EngineeringSelf-Consistency in Prompt EngineeringApr 21, 2025 am 10:50 AM

Harnessing the Power of Self-Consistency in Prompt Engineering: A Comprehensive Guide Have you ever wondered how to effectively communicate with today's advanced AI models? As Large Language Models (LLMs) like Claude, GPT-3, and GPT-4 become increas

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

MinGW - Minimalist GNU for Windows

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

VSCode Windows 64-bit Download

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software