search
HomeTechnology peripheralsAINeural networks may no longer need activation functions? Layer Normalization also has non-linear expression!

神经网络可能不再需要激活函数?Layer Normalization也具有非线性表达!
The AIxiv column is a column where academic and technical content is published on this site. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com

The authors of this article are all from the team of Associate Professor Huang Lei, School of Artificial Intelligence, Beihang University and National Key Laboratory of Complex Critical Software Environment. The first author, Ni Yunhao, is a first-year graduate student, the second author, Guo Yuxin, is a third-year graduate student, and the third author, Jia Junlong, is a second-year graduate student. The corresponding author is Associate Professor Huang Lei (Homepage: https://huangleibuaa.github.io/)

Neural networks are usually composed of three Partly composed: linear layer, nonlinear layer (activation function) and normalization layer. The linear layer is the main location where network parameters exist. The nonlinear layer improves the expressive ability of the neural network, while the normalization layer (Normalization) is mainly used to stabilize and accelerate neural network training. There is little work to study their expressive ability. For example, with Batch Normalization is an example. It can be considered as a linear transformation in the prediction stage and does not introduce nonlinearity in expression. Therefore, researchers generally believe that Normalization cannot improve the expressive ability of the model.

However, the paper "On the Nonlinearity of Layer Normalization" recently published at ICML2024 by the team of Professor Huang Lei from the School of Artificial Intelligence of Beihang University pointed out that layer normalization (Layer Normlization, LN) and its computationally degraded version RMSNorm are nonlinear Expression ability, and the universal approximate classification ability of LN is discussed in detail.

神经网络可能不再需要激活函数?Layer Normalization也具有非线性表达!

  • Paper address: https://arxiv.org/abs/2406.01255

This paper mathematically proves the nonlinearity of LN. And a simple neural network LN-Net containing only linear layers and LN is proposed. If it is deep enough, in theory, given samples and sample categories can be arbitrarily classified. This discovery breaks people's inertia that regards various Normalizations as linear transformations without fitting capabilities, and the nonlinear layer and normalization layer are no longer disjoint neural network modules.

Currently, with the widespread use of transformers, LN, as a fixed component, has become a commonly used technology. This research may provide a new theoretical basis for neural network architecture in this direction in the future. on, it is of groundbreaking significance.

The mathematical discovery of LN nonlinearity

For nonlinear research, the article does not directly discuss the analytical properties of LN itself, but explores the relationship between LN and data in a more practical way interaction.

The author first proposed the statistic SSR (Sum of Squares Ratio) to describe the linear separability of samples under two categories. When a sample is linearly transformed, the SSR also changes. Therefore, the minimum SSR corresponding to the sample under all linear transformations is defined as LSSR. The article points out that when the LSSR is smaller, the linear separability between samples is stronger.

However, when the linear change imposed on the sample is replaced by the structure of "linear transformation-LN-linear transformation", it is found that the new SSR obtained may be lower than the LSSR, which verifies the nonlinear expression of LN— —If LN is linear, then "linear transformation-LN-linear transformation" is also linear, and the resulting new SSR cannot be lower than the LSSR.

Arbitrary separability of LN in classification problems

For further research, the author splits LN into two steps: centering and scaling. Centralization is mathematically a linear transformation, so the nonlinearity of LN mainly exists in the scale scaling operation (also called spherical projection in the article, which is the operation performed by RMSNorm). The author took the simplest linearly inseparable XOR data as an example, and correctly classified these four points through linear transformation and spherical projection.

神经网络可能不再需要激活函数?Layer Normalization也具有非线性表达!

More generally, the author proposes an algorithm to correctly classify any number of samples using LN and linear layers, exploring the universal approximation capability of LN-Net.

神经网络可能不再需要激活函数?Layer Normalization也具有非线性表达!

By constructing algorithm steps, the layer-by-layer transformation of the neural network is converted into a similar sample merging problem, and the universal approximate classification problem is converted into a sample merging problem, and pointed out that - for m samples with any label, You can construct an O(m) layer LN-Net to correctly classify these m samples. This construction method also provides new ideas for calculating the VC dimension of neural networks. The author pointed out that on this basis, it can be inferred that the LN-Net with L normalization layers has a VC dimension of at least L+2.

神经网络可能不再需要激活函数?Layer Normalization也具有非线性表达!

LN nonlinear enhancement and practical application

Based on proving the nonlinearity of LN, the author proposed a grouping layer standardization technology to further enhance the nonlinearity of LN for practical applications. (LN-G). The author mathematically predicts that grouping can strengthen the nonlinearity of LN from the perspective of the Hessian matrix, and preliminarily explores the expressive ability of LN-G experimentally.

The author pointed out that on the CIFAR-10 random label data set, for the usual linear layer model, the accuracy does not exceed 20%; while using the neural network composed of linear layer and LN-G (without introducing traditional Activation function as a nonlinear unit) can achieve an accuracy of 55.85%.
神经网络可能不再需要激活函数?Layer Normalization也具有非线性表达!
The author further explored the classification effect of LN-G in the convolutional neural network without activation function, and experimentally proved that this neural network without activation function does have powerful fitting ability. In addition, the author proposed LN-G-Position by analogy with MLP where GN acts on the entire sample (stretching a single sample into a one-dimensional vector and then performing GN). Using the LN-G-Position method on the ResNet network without non-linear layers can achieve an accuracy of 86.66% on the CIFAR-10 data set, which reflects the powerful expression ability of LN-G-Position.
神经网络可能不再需要激活函数?Layer Normalization也具有非线性表达!
The author then conducted an experimental study on Transformer, replacing the original LN with LN-G. According to the experimental results, it was found that group layer standardization can effectively improve the performance of the Transformer network, proving that in real networks, this feasibility of the theory.

Conclusion and Outlook

In the paper "On the Nonlinearity of Layer Normalization", the author theoretically proved for the first time the universal classification ability of a model containing only linear layers and LN and given a specific depth The VC dimension lower bound of the model. The most important significance here is that the analysis of the expressive ability of traditional deep neural networks has taken a big step towards the widely used modern real networks. This may provide new ideas for future neural network structure design. ideas.

The above is the detailed content of Neural networks may no longer need activation functions? Layer Normalization also has non-linear expression!. For more information, please follow other related articles on the PHP Chinese website!

Statement
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn
The Hidden Dangers Of AI Internal Deployment: Governance Gaps And Catastrophic RisksThe Hidden Dangers Of AI Internal Deployment: Governance Gaps And Catastrophic RisksApr 28, 2025 am 11:12 AM

The unchecked internal deployment of advanced AI systems poses significant risks, according to a new report from Apollo Research. This lack of oversight, prevalent among major AI firms, allows for potential catastrophic outcomes, ranging from uncont

Building The AI PolygraphBuilding The AI PolygraphApr 28, 2025 am 11:11 AM

Traditional lie detectors are outdated. Relying on the pointer connected by the wristband, a lie detector that prints out the subject's vital signs and physical reactions is not accurate in identifying lies. This is why lie detection results are not usually adopted by the court, although it has led to many innocent people being jailed. In contrast, artificial intelligence is a powerful data engine, and its working principle is to observe all aspects. This means that scientists can apply artificial intelligence to applications seeking truth through a variety of ways. One approach is to analyze the vital sign responses of the person being interrogated like a lie detector, but with a more detailed and precise comparative analysis. Another approach is to use linguistic markup to analyze what people actually say and use logic and reasoning. As the saying goes, one lie breeds another lie, and eventually

Is AI Cleared For Takeoff In The Aerospace Industry?Is AI Cleared For Takeoff In The Aerospace Industry?Apr 28, 2025 am 11:10 AM

The aerospace industry, a pioneer of innovation, is leveraging AI to tackle its most intricate challenges. Modern aviation's increasing complexity necessitates AI's automation and real-time intelligence capabilities for enhanced safety, reduced oper

Watching Beijing's Spring Robot RaceWatching Beijing's Spring Robot RaceApr 28, 2025 am 11:09 AM

The rapid development of robotics has brought us a fascinating case study. The N2 robot from Noetix weighs over 40 pounds and is 3 feet tall and is said to be able to backflip. Unitree's G1 robot weighs about twice the size of the N2 and is about 4 feet tall. There are also many smaller humanoid robots participating in the competition, and there is even a robot that is driven forward by a fan. Data interpretation The half marathon attracted more than 12,000 spectators, but only 21 humanoid robots participated. Although the government pointed out that the participating robots conducted "intensive training" before the competition, not all robots completed the entire competition. Champion - Tiangong Ult developed by Beijing Humanoid Robot Innovation Center

The Mirror Trap: AI Ethics And The Collapse Of Human ImaginationThe Mirror Trap: AI Ethics And The Collapse Of Human ImaginationApr 28, 2025 am 11:08 AM

Artificial intelligence, in its current form, isn't truly intelligent; it's adept at mimicking and refining existing data. We're not creating artificial intelligence, but rather artificial inference—machines that process information, while humans su

New Google Leak Reveals Handy Google Photos Feature UpdateNew Google Leak Reveals Handy Google Photos Feature UpdateApr 28, 2025 am 11:07 AM

A report found that an updated interface was hidden in the code for Google Photos Android version 7.26, and each time you view a photo, a row of newly detected face thumbnails are displayed at the bottom of the screen. The new facial thumbnails are missing name tags, so I suspect you need to click on them individually to see more information about each detected person. For now, this feature provides no information other than those people that Google Photos has found in your images. This feature is not available yet, so we don't know how Google will use it accurately. Google can use thumbnails to speed up finding more photos of selected people, or may be used for other purposes, such as selecting the individual to edit. Let's wait and see. As for now

Guide to Reinforcement Finetuning - Analytics VidhyaGuide to Reinforcement Finetuning - Analytics VidhyaApr 28, 2025 am 09:30 AM

Reinforcement finetuning has shaken up AI development by teaching models to adjust based on human feedback. It blends supervised learning foundations with reward-based updates to make them safer, more accurate, and genuinely help

Let's Dance: Structured Movement To Fine-Tune Our Human Neural NetsLet's Dance: Structured Movement To Fine-Tune Our Human Neural NetsApr 27, 2025 am 11:09 AM

Scientists have extensively studied human and simpler neural networks (like those in C. elegans) to understand their functionality. However, a crucial question arises: how do we adapt our own neural networks to work effectively alongside novel AI s

See all articles

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

SAP NetWeaver Server Adapter for Eclipse

SAP NetWeaver Server Adapter for Eclipse

Integrate Eclipse with SAP NetWeaver application server.

SublimeText3 English version

SublimeText3 English version

Recommended: Win version, supports code prompts!

MantisBT

MantisBT

Mantis is an easy-to-deploy web-based defect tracking tool designed to aid in product defect tracking. It requires PHP, MySQL and a web server. Check out our demo and hosting services.

DVWA

DVWA

Damn Vulnerable Web App (DVWA) is a PHP/MySQL web application that is very vulnerable. Its main goals are to be an aid for security professionals to test their skills and tools in a legal environment, to help web developers better understand the process of securing web applications, and to help teachers/students teach/learn in a classroom environment Web application security. The goal of DVWA is to practice some of the most common web vulnerabilities through a simple and straightforward interface, with varying degrees of difficulty. Please note that this software

SecLists

SecLists

SecLists is the ultimate security tester's companion. It is a collection of various types of lists that are frequently used during security assessments, all in one place. SecLists helps make security testing more efficient and productive by conveniently providing all the lists a security tester might need. List types include usernames, passwords, URLs, fuzzing payloads, sensitive data patterns, web shells, and more. The tester can simply pull this repository onto a new test machine and he will have access to every type of list he needs.