Home  >  Article  >  Technology peripherals  >  Just one sentence to make the picture move. Apple uses large model animation to generate, and the result can be edited directly.

Just one sentence to make the picture move. Apple uses large model animation to generate, and the result can be edited directly.

王林
王林forward
2024-03-05 09:46:38601browse

Currently, the amazing innovative capabilities of large-scale models continue to impact the creative field, especially representatives of video generation technology like Sora. Although Sora has led a new generation of trends, it may be worth paying attention to Apple's latest research results now.

Apple researchers recently released a framework called "Keyframer" that can use large language models to generate animations. This framework allows users to easily create animations for static 2D images through natural language prompts. This research demonstrates the potential of language models in designing animations, providing animation designers with more efficient and intuitive tools.

Just one sentence to make the picture move. Apple uses large model animation to generate, and the result can be edited directly.

Paper address: https://arxiv.org/pdf/2402.06071.pdf

Specific Specifically, this research combines emerging design principles based on language prompt design artifacts and the code generation capabilities of LLM to build a new AI-driven animation tool Keyframer. Keyframer allows users to create animated illustrations from static 2D images through natural language prompts. With GPT-4, Keyframer can generate CSS animation code to animate the input SVG (Scalable Vector Graphic).

In addition, Keyframer supports users to directly edit the generated animation through multiple editor types.

Users can continuously improve their designs using the design variants generated by LLM through repeated prompts and requests, thereby thinking in new design directions. However, Keyframer has not yet been made public.

In doing this research, Apple stated that the application of LLM in animation has not been fully explored and brings new challenges, such as how users can effectively describe motion in natural language. . While Vincentian graphics tools such as Dall・E and Midjourney are currently great, animation design requires more complex considerations, such as timing and coordination, that are difficult to fully summarize in a single prompt.

Users simply upload an image, enter something like "let the stars twinkle" into the prompt box, and click Generate to see the effects of this study.

Just one sentence to make the picture move. Apple uses large model animation to generate, and the result can be edited directly.

Users can generate multiple animation designs in a batch and adjust properties such as color code and animation duration in separate windows. No coding experience is required as Keyframer automatically converts these changes to CSS, and the code itself is fully editable. This description-based approach is much simpler than other forms of AI-generated animation, which often require several different applications and some coding experience.

Introduction to Keyframer

Keyframer is an LLM-powered application designed to create animations from static images. Keyframer leverages the code generation capabilities of LLM and the semantic structure of static vector graphics (SVG) to generate animations based on natural language cues provided by the user.

Just one sentence to make the picture move. Apple uses large model animation to generate, and the result can be edited directly.

Input: The system provides an input area where users can paste what they want SVG image code to animate (SVG is a standard and popular image format commonly used in illustrations for its scalability and compatibility on multiple platforms). In Keyframer, a rendering of the SVG is displayed next to the code editor so that the user can preview the visual design of the image. As shown in Figure 2, the SVG code for the Saturn illustration contains identifiers such as sky, rings, etc.

Just one sentence to make the picture move. Apple uses large model animation to generate, and the result can be edited directly.

GPT Prompts: This system allows users to enter natural language prompts to create animations. Users can request a single design (make the planet rotate) or multiple design variations (create a design with 3 twinkling stars), and then click the Generate Animation button to start the request. Before passing user requests to GPT, the study refines its prompts with the complete raw SVG XML and specifies the format of the LLM response.

GPT Output: Once the prompt request starts, GPT transmits the response, which consists of one or more CSS fragments, as shown in Figure 3.

Just one sentence to make the picture move. Apple uses large model animation to generate, and the result can be edited directly.

Rendering: The rendering part includes (1) each animation is visually rendered and rendered by LLM Generated 1-sentence explanation (2) A series of editors for modifying designs.

The code editor is implemented using CodeMirror; the property editor provides property-specific UI for editing code, for example, to edit color, the study provides a color picker. Figure 5 shows the code editor and property editor icons.

Just one sentence to make the picture move. Apple uses large model animation to generate, and the result can be edited directly.

Iteration: To support users to go deeper in the animation creation process (DG1) Exploration, the study also provides a feature that allows users to iteratively build on the generated animation using prompts. There is a button "Add New Prompt" below each generated design; clicking this button opens a new form at the bottom of the page for the user to extend their design with new prompts.

Save the designed sidebar and summary . The system allows users to star designs and add them to the sidebar, as shown on the right side of Figure 6. In addition, the system has a summary mode that hides all text editors and displays animations and their prompts, allowing users to quickly revisit previous prompts and designs.

Just one sentence to make the picture move. Apple uses large model animation to generate, and the result can be edited directly.

During the experiment, the Apple team selected 13 participants (6 women, 7 men) to try out Keyframer. Table 1 provides some information about the participants and the skills they mastered.

Even professional motion designer "EP13" sees the potential of Keyframer to expand its capabilities: "I'm a little worried that these tools will replace our work because its potential is so great. But if you think about it carefully, this research will only improve our skills. It should be something to be happy about."

Just one sentence to make the picture move. Apple uses large model animation to generate, and the result can be edited directly.

Overall, participation users are satisfied with their Keyframer experience. Participants gave an average score of 3.9, ranging between satisfied (4) and neutral (3). Participants generated 223 designs. On average, each participant generated 17.2 designs. Figure 8 shows an example of the final animation for two participants.

Just one sentence to make the picture move. Apple uses large model animation to generate, and the result can be edited directly.

Please refer to the original paper for more technical details.

The above is the detailed content of Just one sentence to make the picture move. Apple uses large model animation to generate, and the result can be edited directly.. For more information, please follow other related articles on the PHP Chinese website!

Statement:
This article is reproduced at:51cto.com. If there is any infringement, please contact admin@php.cn delete