Launched a free personalized academic paper recommendation system - the 'arXiv customized platform' of the top visual teams of German universities-AI-php.cn

Launched a free personalized academic paper recommendation system - the 'arXiv customized platform' of the top visual teams of German universities

王林

Dec 27, 2023 pm 05:49 PM

systemrecommendJapan

Generate one image in 10 milliseconds and 6,000 images in 1 minute. What is the concept?

In the picture below, you can deeply feel the super power of AI.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

Even, when you continue to add new elements to the prompts generated by the two-dimensional lady pictures, each The change of pictures in this style also flashes in an instant.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Pictures

Such an amazing real-time picture generation speed is the result of StreamDiffusion proposed by researchers from UC Berkeley, University of Tsukuba, Japan, etc. bring results.

This new solution is a diffusion model process that enables real-time interactive image generation at over 100fps.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

Paper address: https://arxiv.org/abs/2312.12491

After being open sourced, StreamDiffusion directly dominated the GitHub rankings, garnering 3.7k stars.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

StreamDiffusion innovatively uses a batch processing strategy instead of sequence denoising, which is about 1.5 times faster than traditional methods . Moreover, the new residual classifier-free guidance (RCFG) algorithm proposed by the author can be 2.05 times faster than the traditional classifier-free guidance.

The most noteworthy thing is that the new method can achieve an image-to-image generation speed of 91.07fps on the RTX 4090.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

#In the future, StreamDiffusion will quickly generate in different scenarios such as the metaverse, video game graphics rendering, and live video streaming. Able to meet the high throughput requirements of these applications.

In particular, real-time image generation can provide powerful editing and creative capabilities for those who work in game development and video rendering.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

Designed specifically for real-time image generation

Currently, in various fields, diffusion models The application needs a diffusion pipeline with high throughput and low latency to ensure the efficiency of human-computer interaction

A typical example is to use the diffusion model to create the virtual character VTuber - able to Respond fluidly to user input.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

In order to improve high throughput and real-time interaction capabilities, the current research direction is mainly focused on reducing denoising iterations The number of iterations, for example, is reduced from 50 iterations to a few, or even once.

A common strategy is to refine the multi-step diffusion model into several steps and reconstruct the diffusion process using ODEs. To improve efficiency, diffusion models have also been quantified.

In the latest paper, researchers started from the orthogonal direction and introduced StreamDiffusion, a real-time diffusion pipeline designed for high throughput of interactive image generation. design.

Existing model design work can be integrated with StreamDiffusion while also using N-step denoising diffusion models to maintain high throughput and provide users with more flexible options

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

Real-time image generation｜First and second columns: examples of AI-assisted real-time drawing, third column: real-time rendering from 3D avatars 2D illustration. Columns 4 and 5: Live camera filters. Real-time image generation | The first and second columns show examples of AI-assisted real-time drawing, and the third column shows the process of generating 2D illustrations by rendering 3D avatars in real time. The fourth and fifth columns show the effect of real-time camera filters

How is it implemented?

StreamDiffusion Architecture

StreamDiffusion is a new diffusion pipeline designed to increase throughput.

It consists of several key parts:

Streaming batch processing strategy, residual classifier-free guidance (RCFG), input and output queue, random Model acceleration tools for Stochastic Similarity Filter, precomputation programs, and micro-autoencoders.

Batch denoising

In the diffusion model, the denoising steps are performed in sequence, which leads to the U-Net Processing time,increases proportionally to the number of steps.

However, in order to generate high-fidelity images, the number of steps has to be increased.

In order to solve the problem of high-latency generation in interactive diffusion, researchers proposed a method called Stream Batch.

As shown in the figure below, in the latest methods, instead of waiting for a single image to be completely denoised before processing the next input image, it accepts after each denoising step Next input image.

This forms a denoising batch, and the denoising steps for each image are staggered.

By concatenating these interleaved denoising steps into a batch, researchers can use U-Net to efficiently process batches of consecutive inputs.

The input image encoded at time step t is generated and decoded at time step t n, where n is the number of denoising steps.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

Residual Classifier Free Guided (RCFG)

Common Classifier-free guidance (CFG) is a method that performs vector calculations between the unconditional or negative conditional term and the original conditional term. An algorithm to enhance the effect of the original condition.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

This can bring benefits such as enhancing the effect of the prompt.

However, in order to compute negative conditional residual noise, each input latent variable needs to be paired with a negative conditional embedding and passed to U-Net at each inference time.

To solve this problem, the author introduces an innovative residual classifier-free bootstrapping (RCFG)

This method utilizes virtual residual Noise is used to approximate the negative condition, so that we only need to calculate the negative condition noise in the initial stage of the process, thereby significantly reducing the additional U-Net inference calculation cost when embedding negative conditions

Input and output queue

#Convert the input image into a pipeline-manageable tensor data format, and in turn, convert the decoded tensor back to the output image, both Requires non-negligible additional processing time.

To avoid adding these image processing times to the neural network inference process, we separate image pre- and post-processing into different threads, thereby enabling parallel processing.

In addition, by using input tensor queues, it is also possible to cope with temporary interruptions in input images due to device failures or communication errors, allowing for smooth streaming.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities picture

Stochastic Similarity Filter

The following figure shows the core diffusion inference pipeline, including VAE and U-Net.

Improves the speed of the inference pipeline and enables real-time image generation by introducing denoising batching and pre-computed hint embedding cache, sampled noise cache and scheduler value cache.

Stochastic Similarity Filtering (SSF) is designed to save GPU power consumption and can dynamically close the diffusion model pipeline, thereby achieving fast and efficient real-time inference.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

Precomputation

The U-Net architecture requires both input potential Variables also require conditional embedding.

Normally, conditional embedding is derived from "hint embedding" and remains unchanged between different frames.

To optimize this, the researchers pre-compute hint embeddings and store them in cache. In interactive or streaming mode, this precomputed hint embedding cache is recalled.

In U-Net, the calculation of keys and values for each frame is implemented based on pre-computed hint embeddings

Therefore, The researchers modified U-Net to store these key and value pairs so that they can be reused. Whenever the input prompt is updated, the researchers recompute and update these key and value pairs within U-Net.

Model Acceleration and Tiny Autoencoders

To optimize speed, we configured the system to use a static batch size and a fixed input size (height and width).

This approach ensures that the computation graph and memory allocation are optimized for the specific input size, resulting in faster processing.

However, this means that if you need to process images of different shapes (i.e. different heights and widths), use different batch sizes (including the batch size for the denoising step).

Experimental evaluation

Quantitative evaluation of denoising batches

## Figure 8 shows batch denoising and original sequential U- Efficiency comparison of Net loop

When implementing the batch denoising strategy, the researchers found significant improvements in processing time. This reduces the time in half compared to traditional U-Net loops with sequential denoising steps.

Even with the neural module acceleration tool TensorRT applied, the streaming batch processing proposed by the researchers can still significantly improve the efficiency of the original sequential diffusion pipeline in different denoising steps.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

Additionally, the researchers compared the latest method with the AutoPipeline-ForImage2Image pipeline developed by Huggingface Diffusers.

The average inference time comparison is shown in Table 1. The latest pipeline shows that the speed has been greatly improved.

When using TensorRT, StreamDiffusion is able to achieve a 13x speedup when running 10 denoising steps. When only a single denoising step is involved, the speed increase can reach 59.6 times

Even without TensorRT, StreamDiffusion is 29.7 times faster than AutoPipeline when using single-step denoising. An 8.3x improvement when using 10-step denoising.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

Table 2 compares the inference time of the flow diffusion pipeline using RCFG and conventional CFG.

In the case of single-step denoising, the inference time of Onetime-Negative RCFG and traditional CFG is almost the same.

So the inference time of One-time RCFG and traditional CFG during single-step denoising is almost the same. However, as the number of denoising steps increases, the inference speed improvement from traditional CFG to RCFG becomes more obvious.

In the fifth step of denoising, Self-Negative RCFG is 2.05 times faster than traditional CFG, and Onetime-Negative RCFG is 1.79 times faster than traditional CFG.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

After this, the researchers carried out the Energy consumption was comprehensively assessed. The results of this process can be seen in Figures 6 and 7

These figures demonstrate the application of SSF (setting the threshold eta to 0.98) to the input video to contain periodic static Comparative analysis of GPU usage patterns in characteristic scenes shows that when the input images are mainly static images and have a high degree of similarity, using SSF can significantly reduce GPU usage.

Picture

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Ablation study

Different modules perform different denoising steps The impact on average inference time is shown in Table 3. As can be seen, the reduction of different modules is verified in the image-to-image generation process.

Pictures

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Qualitative results

are demonstrated in Figure 10 using the remaining Alignment process for fast conditional adjustment of generated images without classifier guidance (RCFG)

The generated images, without using any form of CFG, show weak alignment hints, especially in Aspects such as color changes or adding non-existent elements were not implemented efficiently.

In contrast, the use of CFG or RCFG enhances the ability to modify the original image, such as changing hair color, adding body patterns, and even including objects like glasses. Notably, the use of RCFG can enhance the impact of cues compared with standard CFG.

Picture

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Finally, the quality of the standard text-to-image generation results is shown in Figure 11.

Using the sd-turbo model, you can generate high-quality images like the one shown in Figure 11 in just one step.

When using the flow diffusion pipeline and sd-turbo model proposed by the researcher in the environment of GPU: RTX 4090, CPU: Core i9-13900K, OS: Ubuntu 22.04.3 LTS When generating images, it is feasible to produce such high quality images at over 100fps.

Picture

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Netizens got started, and a large wave of two-dimensional ladies came

The code of the latest project has been Open source, it has collected 3.7k stars on Github.

Picture

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Project address: https://github.com/cumulo-autumn/StreamDiffusion

Many netizens have begun to generate their own two-dimensional wives.

Pictures

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities There are also real-time animations.

Pictures

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities 10x speed hand-drawn generation.

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities Picture

Launched a free personalized academic paper recommendation system - the arXiv customized platform of the top visual teams of German universities ##Picture

Those who are interested in children's shoes, why not do it yourself.

Reference:

https://www.php.cn/link/f9d8bf6b7414e900118caa579ea1b7be

https://www.php.cn/link/75a6e5993aefba4f6cb07254637a6133

The above is the detailed content of Launched a free personalized academic paper recommendation system - the 'arXiv customized platform' of the top visual teams of German universities. For more information, please follow other related articles on the PHP Chinese website!

Statement

This article is reproduced at:51CTO.COM. If there is any infringement, please contact admin@php.cn delete

A Comprehensive Guide to ExtrapolationApr 15, 2025 am 11:38 AM

Introduction Suppose there is a farmer who daily observes the progress of crops in several weeks. He looks at the growth rates and begins to ponder about how much more taller his plants could grow in another few weeks. From th

The Rise Of Soft AI And What It Means For Businesses TodayApr 15, 2025 am 11:36 AM

Soft AI — defined as AI systems designed to perform specific, narrow tasks using approximate reasoning, pattern recognition, and flexible decision-making — seeks to mimic human-like thinking by embracing ambiguity. But what does this mean for busine

Evolving Security Frameworks For The AI FrontierApr 15, 2025 am 11:34 AM

The answer is clear—just as cloud computing required a shift toward cloud-native security tools, AI demands a new breed of security solutions designed specifically for AI's unique needs. The Rise of Cloud Computing and Security Lessons Learned In th

3 Ways Generative AI Amplifies Entrepreneurs: Beware Of Averages!Apr 15, 2025 am 11:33 AM

Entrepreneurs and using AI and Generative AI to make their businesses better. At the same time, it is important to remember generative AI, like all technologies, is an amplifier – making the good great and the mediocre, worse. A rigorous 2024 study o

New Short Course on Embedding Models by Andrew NgApr 15, 2025 am 11:32 AM

Unlock the Power of Embedding Models: A Deep Dive into Andrew Ng's New Course Imagine a future where machines understand and respond to your questions with perfect accuracy. This isn't science fiction; thanks to advancements in AI, it's becoming a r

Is Hallucination in Large Language Models (LLMs) Inevitable?Apr 15, 2025 am 11:31 AM

Large Language Models (LLMs) and the Inevitable Problem of Hallucinations You've likely used AI models like ChatGPT, Claude, and Gemini. These are all examples of Large Language Models (LLMs), powerful AI systems trained on massive text datasets to

The 60% Problem — How AI Search Is Draining Your TrafficApr 15, 2025 am 11:28 AM

Recent research has shown that AI Overviews can cause a whopping 15-64% decline in organic traffic, based on industry and search type. This radical change is causing marketers to reconsider their whole strategy regarding digital visibility. The New

MIT Media Lab To Put Human Flourishing At The Heart Of AI R&DApr 15, 2025 am 11:26 AM

A recent report from Elon University’s Imagining The Digital Future Center surveyed nearly 300 global technology experts. The resulting report, ‘Being Human in 2035’, concluded that most are concerned that the deepening adoption of AI systems over t

See all articles

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks agoByDDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Chat Commands and How to Use Them

4 weeks agoBy尊渡假赌尊渡假赌尊渡假赌

Hot Tools

mPDF

mPDF is a PHP library that can generate PDF files from UTF-8 encoded HTML. The original author, Ian Back, wrote mPDF to output PDF files "on the fly" from his website and handle different languages. It is slower than original scripts like HTML2FPDF and produces larger files when using Unicode fonts, but supports CSS styles etc. and has a lot of enhancements. Supports almost all languages, including RTL (Arabic and Hebrew) and CJK (Chinese, Japanese and Korean). Supports nested block-level elements (such as P, DIV),

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

WebStorm Mac version

Useful JavaScript development tools

MinGW - Minimalist GNU for Windows

This project is in the process of being migrated to osdn.net/projects/mingw, you can continue to follow us there. MinGW: A native Windows port of the GNU Compiler Collection (GCC), freely distributable import libraries and header files for building native Windows applications; includes extensions to the MSVC runtime to support C99 functionality. All MinGW software can run on 64-bit Windows platforms.

VSCode Windows 64-bit Download

A free and powerful IDE editor launched by Microsoft

Hot Topics

Where is the login entrance for gmail email?

7518

CakePHP Tutorial

1378

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers