


Common tasks! Tsinghua proposes backbone network Flowformer to achieve linear complexity | ICML2022
Task universality is one of the core goals of basic model research, and it is also the only way for deep learning research to lead to advanced intelligence. In recent years, thanks to the universal key modeling capabilities of the attention mechanism, Transformer has performed well in many fields and has gradually shown a trend of universal architecture. However, as the length of the sequence increases, the calculation of the standard attention mechanism exhibits quadratic complexity, which seriously hinders its application in long sequence modeling and large models.
To this end, a team from the School of Software, Tsinghua University deeply explored this key issue and proposed a task-universal linear complexity backbone network Flowformer, while maintaining the versatility of the standard Transformer. At the same time, its complexity was reduced to linear, and the paper was accepted by ICML 2022.
## Author list: Wu Haixu, Wu Jialong, Xu Jiehui, Wang Jianmin, Long Mingsheng
Link: https://arxiv.org/pdf/2202.06258.pdf
Code: https://github.com/thuml/ Flowformer
Compared with the standard Transformer, the Flowformer model proposed in this article has the following characteristics:
- Linear complexity, can handle input sequences of thousands of lengths;
- does not introduce new inductive preferences, maintaining the universality of the original attention mechanism Modeling ability;
- Universal tasks, and achieved excellence in the five major tasks of long sequences, vision, natural language, time series, and reinforcement learning Effect.
The standard attention mechanism input contains three parts: queries(), keys() and values(), and its calculation method As follows: where is the attention weight matrix, and the final calculation result will be obtained by weighted fusion. The computational complexity of the above process is. It is noted that there have been many studies on the problem of continuous multiplication of multinomial matrices in classical algorithms. In particular, for the attention mechanism, we can use the associative law of matrix multiplication to achieve optimization, for example, the original quadratic complexity can be reduced to linear. But the function in the attention mechanism makes it impossible to apply the associative law directly. Therefore, how to remove functions in the attention mechanism is the key to achieving linear complexity. However, much recent work has demonstrated that functions play a key role in avoiding trivial attentional learning. In summary, we look forward to a model design solution that achieves the following goals: (1) remove functions; (2) avoid trivial attention; (3) maintain the versatility of the model.
2. MotivationIn view of goal (1), in previous work, the kernel method is often used to replace the function, that is, through approximate attention calculation (for non- linear function), but removing it directly would cause trivial attention. To this end, for goal (2), previous work had to introduce some inductive preferences, which limited the versatility of the model , and therefore did not meet goal (3), such as the locality assumption in cosFormer.
Competition mechanism in SoftmaxIn order to meet the above goals, we analyze it based on the basic properties of . We note that it was originally proposed to extend the "winner-take-all" maximum operation into a differentiable form. Therefore, thanks to its inherent "competition" mechanism, it can differentiate the attention weights between various tokens, thereby avoiding ordinary attention problems. Based on the above considerations, we try to introduce the competition mechanism into the attention mechanism design, so as to avoid the trivial attention problems caused by kernel method decomposition.
Competition mechanism in network flowWe pay attention to the "Conservation"## in the classic network flow (Flow network) model in graph theory. #(Conservation) is an important phenomenon, that is, the inflow of each node is equal to the outflow. Inspired by "Fixed resources will inevitably cause competition", in this article, we try to re-analyze the information flow in the classic attention mechanism from the perspective of network flow, and convert competition through conservation properties Introduce attention mechanism design to avoid ordinary attention problems. 3. Flowformer
3.1 Attention mechanism from the perspective of network flow
Inside the attention mechanism: the flow of information can be expressed as: fromSource (source, corresponding) is gathered to sink (sink, corresponding) based on the learned flow capacity (flow capacity, corresponding attention weight). Outside the attention mechanism, the information of the source (v) comes from the upper layer of the network, and the information of the sink (R) will also be provided to the feed-forward layer below. Based on the above observations, we can from the inflow From the two perspectives of flow and outflow, we control the interaction between the attention mechanism and the external network to achieve "fixed resources", thereby causing competition within the source and sink respectively to avoid ordinary attention. Without loss of generality, we set the amount of interaction information between the attention mechanism and the external network to the default value 1. (1) The inflow conservation of the sink (R): is not difficult to obtain. Before conservation, for the th sink, the amount of information flowing in is: #At this time, due to the conservation of the inflow of the sink, there is natural competition between the various sources (V) Relationship, we calculate the amount of information provided by each source (V) at this time, and we can get: the amount of information provided by each source under competition, which also represents the importance of each source. (2) Conservation of outflow from source (V): Similar to the aforementioned process, before conservation, for the source, the amount of information flowing out of it is: (3) Overall design Based on the above results, we design the following Flow-Attention mechanism, specifically including competition (Competition), aggregation (Aggregation), and allocation (Allocation) three parts: Competition introduces the competition mechanism to highlight important information; Aggregation realizes linear complexity based on the matrix associative law; Allocation introduces the competition mechanism and transfers control to the next step. One layer of information. All operations in the above process have linear complexity. At the same time, the design of Flow-Attention only relies on the conservation principle in network flow and reintegrates information flow. Therefore, it does not introduce new inductive preferences, ensuring the versatility of the model. Flowformer is obtained by replacing the quadratic complexity Attention in the standard Transformer with Flow-Attention. This paper conducts extensive experiments on standard data sets: As shown in the table below, Flowformer performed well on all five tasks, verifying the versatility of the model. Please see the paper for detailed experimental results. In order to further explain the working principle of Flowformer, we conducted a visual experiment on the attention in the ImageNet classification task (corresponding to Flow-Attention), from which we can find: The above visualization shows that introducing competition into the attention mechanism design through Flow-Attention can effectively avoid trivial attention. More visualization experiments can be found in the paper. The Flowformer proposed in this article introduces the conservation principle in network flow into the design, and naturally introduces the competition mechanism into the attention calculation, effectively avoiding It solves the trivial attention problem and maintains the versatility of the standard Transformer while achieving linear complexity. Flowformer has achieved excellent results in five major tasks: long sequence, vision, natural language, time series, and reinforcement learning. In addition, the design concept of "no special induction preference" in Flowformer is also inspiring to the research of general infrastructure. In future work, we will further explore the potential of Flowformer for large-scale pre-training. 3.2 Flow-Attention
. In order to fix the amount of information flowing into each sink to unit 1, we introduce
as a normalization in the calculation of the information flow (attention weight). After normalization, the inflow information amount of the th sink is:
. In order to fix the amount of information flowing out of each source to unit 1, we will introduce the calculation of the information flow (attention weight) as a normalization. After normalization, the amount of outflow information from the jth source is:
. At this time, due to the conservation of outflow from the source, there is a natural competition relationship between the sinks (). We calculate the amount of information received by each sink () at this time, and we can get: In the case of competition, the final required for each result is The amount of information received.
5. Analysis
6. Summary
The above is the detailed content of Common tasks! Tsinghua proposes backbone network Flowformer to achieve linear complexity | ICML2022. For more information, please follow other related articles on the PHP Chinese website!

Introduction Suppose there is a farmer who daily observes the progress of crops in several weeks. He looks at the growth rates and begins to ponder about how much more taller his plants could grow in another few weeks. From th

Soft AI — defined as AI systems designed to perform specific, narrow tasks using approximate reasoning, pattern recognition, and flexible decision-making — seeks to mimic human-like thinking by embracing ambiguity. But what does this mean for busine

The answer is clear—just as cloud computing required a shift toward cloud-native security tools, AI demands a new breed of security solutions designed specifically for AI's unique needs. The Rise of Cloud Computing and Security Lessons Learned In th

Entrepreneurs and using AI and Generative AI to make their businesses better. At the same time, it is important to remember generative AI, like all technologies, is an amplifier – making the good great and the mediocre, worse. A rigorous 2024 study o

Unlock the Power of Embedding Models: A Deep Dive into Andrew Ng's New Course Imagine a future where machines understand and respond to your questions with perfect accuracy. This isn't science fiction; thanks to advancements in AI, it's becoming a r

Large Language Models (LLMs) and the Inevitable Problem of Hallucinations You've likely used AI models like ChatGPT, Claude, and Gemini. These are all examples of Large Language Models (LLMs), powerful AI systems trained on massive text datasets to

Recent research has shown that AI Overviews can cause a whopping 15-64% decline in organic traffic, based on industry and search type. This radical change is causing marketers to reconsider their whole strategy regarding digital visibility. The New

A recent report from Elon University’s Imagining The Digital Future Center surveyed nearly 300 global technology experts. The resulting report, ‘Being Human in 2035’, concluded that most are concerned that the deepening adoption of AI systems over t


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SublimeText3 Linux new version
SublimeText3 Linux latest version

SAP NetWeaver Server Adapter for Eclipse
Integrate Eclipse with SAP NetWeaver application server.

VSCode Windows 64-bit Download
A free and powerful IDE editor launched by Microsoft

Dreamweaver Mac version
Visual web development tools

Atom editor mac version download
The most popular open source editor