


New backbone of lightweight visual network: efficient Fourier operator token mixer
1. Background
Over the years, three vision backbone networks, Transformer, Large-kernel CNN, and MLP, have achieved great success in a wide range of computer vision tasks. , This is mainly due to their ability to efficiently fuse information on a global scale
Transformer, CNN and MLP are the three mainstream neural networks currently, and they use different ways to implement them. Global scope Token fusion. In the Transformer network, the self-attention mechanism uses the correlation of query-key pairs as the weight of Token fusion. CNN achieves similar performance to Transformer by expanding the size of the convolution kernel. MLP implements another powerful paradigm between all tokens through full connectivity. Although these methods are effective, they have high computational complexity (O(N^2)) and are difficult to deploy on devices with limited storage and computing capabilities, thus limiting the application scope of many models
2. AFF Token Mixer: lightweight, global, adaptive
In order to solve the computationally expensive problem, researchers developed a method called adaptive Fu Efficient global token fusion algorithm of Adaptive Fourier Filter (AFF). This algorithm uses Fourier transform to convert the Token set into the frequency domain, and learns a filter mask capable of adaptive content in the frequency domain to perform adaptive filtering operations on the Token set converted into the frequency domain space
Adaptive Frequency Filters: Efficient Global Token Mixers
##Click this link to access the original text: https://arxiv .org/abs/2307.14008
According to the frequency domain convolution theorem, the mathematical equivalent operation of AFF Token Mixer is a convolution operation performed in the original domain, which is equivalent to the Fourier Hadamard product operation in the domain. This means that AFF Token Mixer can achieve content-adaptive global token fusion by using a dynamic convolution kernel in the original domain, whose spatial resolution is the same as the size of the token set (as shown in the right subfigure of the figure below)
It is well known that dynamic convolution is computationally expensive, especially when using dynamic convolution kernels with large spatial resolution. This cost seems to be too high for efficient/lightweight network design. It's unacceptable. However, the AFF Token Mixer proposed in this article can simultaneously meet the above requirements in an equivalent implementation with low power consumption, reducing the complexity from O (N^2) to O (N log N), thereby significantly improving computing efficiency
Schematic diagram 1: shows the structure of the AFF module and AFFNet network
3. AFFNet: lightweight New backbone of level visual network
By using AFF Token Mixer as the main neural network operation operator, researchers successfully constructed a lightweight neural network called AFFNet. Rich experimental results show that AFF Token Mixer achieves an excellent balance of accuracy and efficiency in a wide range of visual tasks, including visual semantic recognition and dense prediction tasks
4. Experimental results
Researchers evaluated the performance of AFF Token Mixer and AFFNet on multiple tasks such as visual semantic recognition, segmentation, and detection, and compared them with the most advanced lightweight visual backbone in the current research field. The network was compared. Experimental results show that the model design performs well in a wide range of visual tasks, confirming the potential of AFF Token Mixer as a new generation of lightweight and efficient token fusion operator
Compared with SOTA, Figure 2 shows the Acc-Param and Acc-FLOPs curves on the ImageNet-1K data set
Compare the results of state-of-the-art methods with the ImageNet-1K dataset, see Table 1
Table 2 shows the results of the visual detection and segmentation tasks with the Comparison of advanced technologies
5. Conclusion
This study proves that the frequency domain transformation in the latent space plays an important role in global adaptive token fusion and is an efficient and A low-power equivalent implementation. It provides new research ideas for the design of Token fusion operators in neural networks, and provides new development space for deploying neural network models on edge devices, especially when storage and computing capabilities are limited
The above is the detailed content of New backbone of lightweight visual network: efficient Fourier operator token mixer. For more information, please follow other related articles on the PHP Chinese website!

Exploring the Inner Workings of Language Models with Gemma Scope Understanding the complexities of AI language models is a significant challenge. Google's release of Gemma Scope, a comprehensive toolkit, offers researchers a powerful way to delve in

Unlocking Business Success: A Guide to Becoming a Business Intelligence Analyst Imagine transforming raw data into actionable insights that drive organizational growth. This is the power of a Business Intelligence (BI) Analyst – a crucial role in gu

SQL's ALTER TABLE Statement: Dynamically Adding Columns to Your Database In data management, SQL's adaptability is crucial. Need to adjust your database structure on the fly? The ALTER TABLE statement is your solution. This guide details adding colu

Introduction Imagine a bustling office where two professionals collaborate on a critical project. The business analyst focuses on the company's objectives, identifying areas for improvement, and ensuring strategic alignment with market trends. Simu

Excel data counting and analysis: detailed explanation of COUNT and COUNTA functions Accurate data counting and analysis are critical in Excel, especially when working with large data sets. Excel provides a variety of functions to achieve this, with the COUNT and COUNTA functions being key tools for counting the number of cells under different conditions. Although both functions are used to count cells, their design targets are targeted at different data types. Let's dig into the specific details of COUNT and COUNTA functions, highlight their unique features and differences, and learn how to apply them in data analysis. Overview of key points Understand COUNT and COU

Google Chrome's AI Revolution: A Personalized and Efficient Browsing Experience Artificial Intelligence (AI) is rapidly transforming our daily lives, and Google Chrome is leading the charge in the web browsing arena. This article explores the exciti

Reimagining Impact: The Quadruple Bottom Line For too long, the conversation has been dominated by a narrow view of AI’s impact, primarily focused on the bottom line of profit. However, a more holistic approach recognizes the interconnectedness of bu

Things are moving steadily towards that point. The investment pouring into quantum service providers and startups shows that industry understands its significance. And a growing number of real-world use cases are emerging to demonstrate its value out


Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

SecLists
SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.

PhpStorm Mac version
The latest (2018.2.1) professional PHP integrated development tool

DVWA
Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

Dreamweaver Mac version
Visual web development tools

Dreamweaver CS6
Visual web development tools